Google Gemini 1.5: What’s New and How It Compares to Bard and Gemini 1.0
Google has recently announced its next-generation AI model, Gemini 1.5, which boasts major improvements over its predecessor, Gemini 1.0, as well as its former name, Bard. Gemini 1.5 is a multimodal model, which means it can process and generate text, images, audio, and video, and also understand the connections between them. Gemini 1.5 is powered by a new version of Mixture-of-Experts (MoE) architecture, which allows the model to learn and activate the most relevant pathways in its neural network, increasing efficiency and performance.
One of the most notable features of Gemini 1.5 is its long-context understanding, which enables the model to process up to one million tokens in one sitting. A token is a unit of information, such as a word, a character, or a pixel. The more tokens a model can process, the more information it can take in and the better its responses can be. For comparison, Bard and Gemini 1.0 could only process up to 32,000 tokens, which is equivalent to about 4,000 words or a short story. Gemini 1.5 can process up to 1 hour of video, 11 hours of audio, and codebases with over 30,000 lines of code or over 700,000 words, which is equivalent to a long novel.
This long-context understanding allows Gemini 1.5 to perform tasks that were previously impossible or very difficult for AI models, such as summarizing a movie, translating a book, or generating a video from a text prompt. Gemini 1.5 can also answer complex questions that require multimodal reasoning, such as “What is the name of the song that plays in the background of this video?” or “What is the main theme of this book?”.
Gemini 1.5 also outperforms Bard and Gemini 1.0 in various benchmarks that measure the model’s accuracy, fluency, and creativity. For instance, Gemini 1.5 scored higher than Bard and Gemini 1.0 in the Needle In A Haystack (NIAH) evaluation, which tests the model’s ability to find relevant information in a large corpus of text, and the Machine Translation from One Book (MTOB) benchmark, which tests the model’s ability to learn a new language from a single book.
Gemini 1.5 is not only more capable, but also more efficient than Bard and Gemini 1.0. According to Google, Gemini 1.5 Pro, the first Gemini 1.5 model that is available for early testing, performs at a similar level to Gemini 1.0 Ultra, the largest and most advanced model that Google has released so far, while using less compute. This means that Gemini 1.5 can run faster and cheaper than Gemini 1.0, making it more accessible and scalable for developers and customers.
Google has been rebranding and upgrading its AI models since December 2023, when it changed the name of Bard to Gemini, and introduced Gemini Advanced, a new app that allows users to interact with Gemini models and create various types of content. Google also launched a new premium AI plan, which gives users access to more features and capabilities of Gemini models. Google claims that Gemini models are designed with safety and responsibility at the core, and that it follows rigorous testing and evaluation processes to ensure the quality and reliability of its AI systems.
Google Gemini 1.5 is a remarkable achievement in the field of AI, and shows the potential of multimodal models to revolutionize various domains and industries. However, Gemini 1.5 is still an experimental and evolving technology, and it faces many challenges and risks, such as ensuring the ethical and social implications of its AI outputs, maintaining the transparency and accountability of its AI development, and navigating the complex regulatory and legal landscape of AI.
Leave a Reply