Google has recently presented its latest AI model, the "Gemini 1.5 Pro". What is the next generation of “AI”? This model is constructed on the "MoE" architecture. This new model promises significant advancements compared to other models of its kind. But only in the future will we know how good this model is. Google has positioned the "Gemini 1.5 Pro" as an improved and significantly upgraded model compared to its pioneers. It has been designed to be scalable across a broad range of workloads.
Google Gemini Pro |
Gemini 1.5 PRO: What makes it unique?
"Gemini 1.5 Pro" has a deep understanding of the context across various modalities. According to Google, the Gemini can achieve similar results as the recently released Gemini 1.3 Ultra, but at a much lower computational power. The most notable feature is the ability to continuously process information across a maximum of 1 million tokens, which is the largest context window of any large scale foundational model so far. To give you an idea of how much context is available, consider the context window of 32,000 for Gemini 1.3 models, 128,000 for GPT4 Turbo, and 200,000 for Claude 2,1.Although the model has a default context window of 128,000 tokens, Google is enabling a limited number of developers (and enterprise customers) to try out up to 1 million tokens in the context window. Currently, in the preview mode, Google is enabling developers to use Google's AI Studio, as well as Vertex AI, to try out Gemini 1,5 Pro.
What are the use cases of Gemini 1.5 pro?
"Gemini 1.5 Pro" claims to be able to handle 700k words or 30k lines of code, which is 35x more than Gemini 1.0. It can also handle 11hrs of audio and 1hrs of video across multiple languages. In demonstration videos shared on Google’s official "YouTube channel", the model’s contextual understanding was demonstrated with a 402 page PDF as the prompt. In the live interaction, the model responded to a prompt that included 326,658 tokens (including 256 tokens of images) for a total of 327,309 tokens."Gemini 1.5 Pro" utilised a 44 minute video, in this case a silent film record of Sherlock Jr., with various multi-modal prompts. The video had a total of 696,161 tokens, 256 of which were for images.