Google launches Gemini: multimodal AI that surpasses GPT-4

Google has just launched its project Gemini, which consists of three different AI models, all of which will take on OpenAI's ChatGPT.

Gemini Nano is designed for mobile phones and can run offline on Android devices.
The Gemini Pro is a slightly larger model used for Bard in the United States. The Pro edition is designed to run many of Google's AI services.
Gemini Ultra, which will only be launched in the new year, is the largest model and can perform very complex tasks. The model is designed to run on data centers and enterprise applications.

Common to the models is that they are multimodal, meaning they can simultaneously process text, sound, images, video, and computer code. According to Google, the Gemini Ultra should be better than ChatGPT's best model, the GPT-4, which it should beat in 30 out of 32 benchmarks. Including it would be suitable for advanced reasoning and image understanding.

Below, we have collected several examples from the launch of Gemini that relate to teaching.

Scientific articles

This video uses Gemini to read, understand, and filter 200,000 scientific articles to extract crucial scientific information. All during a lunch break.

Programming

In this video, Google demonstrates Gemini's advanced coding skills, including rapid web app prototyping to explore London's train stations. In addition, they also introduced AlphaCode 2, an advanced code generation system that can solve programming problems involving complex mathematics and theoretical computer science.

Mathematics and physics

This video shows Gemini's multimodal capabilities and reasoning ability to examine a handwritten homework sheet. Subsequently, Gemini creates customized explanations and helps users practice questions to help test and expand their knowledge of physics.