Matomo

Google launches Gemini: multimodal AI that surpasses GPT-4

· 3 min read
Google launches Gemini: multimodal AI that surpasses GPT-4

Google has just launched its project Gemini, which consists of three different AI models, all of which will take on OpenAI's ChatGPT.

  • Gemini Nano is designed for mobile phones and can run offline on Android devices.
  • The Gemini Pro is a slightly larger model used for Bard in the United States. The Pro edition is designed to run many of Google's AI services.
  • Gemini Ultra, which will only be launched in the new year, is the largest model and can perform very complex tasks. The model is designed to run on data centers and enterprise applications.

Common to the models is that they are multimodal, meaning they can simultaneously process text, sound, images, video, and computer code. According to Google, the Gemini Ultra should be better than ChatGPT's best model, the GPT-4, which it should beat in 30 out of 32 benchmarks. Including it would be suitable for advanced reasoning and image understanding. 

Below, we have collected several examples from the launch of Gemini that relate to teaching.

Scientific articles

This video uses Gemini to read, understand, and filter 200,000 scientific articles to extract crucial scientific information. All during a lunch break.

Programming

In this video, Google demonstrates Gemini's advanced coding skills, including rapid web app prototyping to explore London's train stations. In addition, they also introduced AlphaCode 2, an advanced code generation system that can solve programming problems involving complex mathematics and theoretical computer science.

Mathematics and physics

This video shows Gemini's multimodal capabilities and reasoning ability to examine a handwritten homework sheet. Subsequently, Gemini creates customized explanations and helps users practice questions to help test and expand their knowledge of physics.

Sound

Below, Google demonstrates Gemini's ability to understand sound in different languages from multiple speakers and combine sight, sound, and text.

Gemini's multimodal options

Below are the multimodal options with Gemini.

Sources:

Gemini - Google DeepMind
Gemini is built from the ground up for multimodality — reasoning seamlessly across image, video, audio, and code.
Google launches Gemini, the AI model it hopes will take down GPT-4
Google let OpenAI take the lead in the AI race — now, it’s mounting a comeback.
Google launches its largest and ‘most capable’ AI model, Gemini
The company is planning to license Gemini to customers through Google Cloud for them to use in their own applications.
Google Just Launched Gemini, Its Long-Awaited Answer to ChatGPT
Google says Gemini, launching today inside the Bard chatbot, is its “most capable” AI model ever. It was trained on video, images, and audio as well as text.
Google DeepMind’s new Gemini model looks amazing—but could signal peak AI hype
It outmatches GPT-4 in almost all ways—but only by a little. Was the buzz worth it?