Gemini: A Family of Highly Capable Multimodal Models

📅 Unknown Date
📈 Citations: 0
Influential: 0
📄 PDF

career value

220K/year
🤖 AI Summary
To address the dual challenge of achieving strong multimodal reasoning capabilities while enabling efficient on-device deployment, this paper introduces the Gemini family of large multimodal models—Gemini Ultra, Pro, and Nano. We propose a novel unified multimodal sequence modeling framework that integrates cross-modal joint representation learning, a hierarchical and scalable architecture, and responsible alignment techniques. Our models achieve human-expert-level performance on the MMLU benchmark—the first such result—and establish new state-of-the-art (SOTA) results on 30 out of 32 comprehensive evaluations, leading across 20 major multimodal benchmarks. Gemini Ultra significantly surpasses prior best methods on critical tasks including MMLU. The entire Gemini series has been deployed across Google AI Studio, Vertex AI, and Gemini Advanced, enabling seamless end-to-end deployment—from cloud-based complex reasoning to resource-constrained edge devices.
📝 Abstract
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
Problem

Research questions and friction points this paper is trying to address.

Developing highly capable multimodal models for diverse data types
Advancing state-of-the-art performance across multiple benchmarks
Enabling cross-modal reasoning and responsible deployment for various applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal models for image, audio, video, text
Three sizes: Ultra, Pro, Nano for diverse applications
Post-training deployment via Gemini services and Google AI
🔎 Similar Papers
No similar papers found.
G
Gemini Team Google Rohan Anil
Google
S
Sebastian Borgeaud
Google
Y
Yonghui Wu
Google
Jean-Baptiste Alayrac
Jean-Baptiste Alayrac
DeepMind, London
Computer VisionMachine LearningArtifical intelligence
Jiahui Yu
Jiahui Yu
Research Scientist, OpenAI
Artificial Intelligence
Radu Soricut
Radu Soricut
Distinguished Scientist at Google
VLMsMultimodal Modeling
J
J. Schalkwyk
Google
A
Andrew M. Dai
Google
A
Anja Hauth
Google
K
Katie Millican
Google
D
David Silver
Google
Slav Petrov
Slav Petrov
Vice President, Research at Google DeepMind
Natural Language ProcessingMachine Learning
Melvin Johnson
Melvin Johnson
Researcher, Google
Artificial IntelligenceNatural Language ProcessingMachine Translation
Ioannis Antonoglou
Ioannis Antonoglou
Google
Julian Schrittwieser
Julian Schrittwieser
Google
A
A. Glaese
Google
Jilin Chen
Jilin Chen
Google
Human Computer InteractionMachine Learning
Emily Pitler
Emily Pitler
Google
T
T. Lillicrap
Google
Angeliki Lazaridou
Angeliki Lazaridou
Research Scientist, Google DeepMind
Emergent CommunicationComputational LinguisticsNatural Language ProcessingArtificial
Orhan Firat
Orhan Firat
Google AI
Machine Learning
J
James Molloy
Google
M
M. Isard
Google
Paul Barham
Paul Barham
Google DeepMind
PerformanceDistributed SystemsOperating SystemsNetworkingMachine Learning
T
Tom Hennigan
Google
Benjamin Lee
Benjamin Lee
JPMorganChase Global Technology Applied Research
Immersive AnalyticsAR/VRData VisualisationSituated VisualisationHuman-computer Interaction
F
Fabio Viola
Google
Malcolm Reynolds
Malcolm Reynolds
DeepMind
Machine Learning
Yuanzhong Xu
Yuanzhong Xu
Unknown affiliation
Deep LearningSystemsSecurity
R
Ryan Doherty
Google
E
Eli Collins
Google
C
Clemens Meyer
Google
E
Eliza Rutherford
Google
E
Erica Moreira
Google
K
Kareem W. Ayoub
Google
M
Megha Goel
Google
George Tucker
George Tucker
Google DeepMind
Reinforcement Learning
E
Enrique Piqueras
Google
M
M. Krikun
Google
I
Iain Barr
Google
Nikolay Savinov
Nikolay Savinov
Research Scientist, Google DeepMind
Deep LearningGenerative ModelsNatural Language ProcessingComputer Vision
I
Ivo Danihelka
Google
B
Becca Roelofs
Google
A
Anais White
Google
A
Anders Andreassen
Google
Tamara von Glehn
Tamara von Glehn
DeepMind
L
Lakshman Yagati
Google
Mehran Kazemi
Mehran Kazemi
Staff Research Scientist, Google DeepMind
Machine LearningLarge Language ModelsReasoningArtificial General Intelligence
L
Lucas Gonzalez
Google
Misha Khalman
Misha Khalman
Google
Jakub Sygnowski
Jakub Sygnowski
Google
A
Alexandre Frechette
Google
C
Charlotte Smith
Google
L
Laura Culp
Google
Lev Proleev
Lev Proleev
Google
Yi Luan
Yi Luan
Google Deepmind
Natural Language ProcessingMultimodalNeural Networks
X
Xi Chen
Google
J
James Lottes
Google
N
Nathan Schucher
Google
F
Federico Lebron
Google
A
Alban Rrustemi
Google
N
Natalie Clay
Google
P
Phil Crone
Google
T
Tomás Kociský
Google
J
Jeffrey Zhao
Google
B
Bartek Perz
Google
D
Dian Yu
Google
Heidi Howard
Heidi Howard
Senior Researcher at Microsoft
Distributed SystemsCloud ComputingDistributed AlgorithmsFault ToleranceDistributed Consensus
Adam Bloniarz
Adam Bloniarz
Google
Causal inferenceMachine learning
J
Jack W. Rae
Google
Han Lu
Han Lu
Google
L
L. Sifre
Google
M
Marcello Maggioni
Google
F
Fred Alcober
Google
D
Daniel H Garrette
Google
Megan Barnes
Megan Barnes
Google DeepMind
Natural Language ProcessingComputational LinguisticsSemanticsMachine Learning
S
S. Thakoor
Google
Jacob Austin
Jacob Austin
Researcher, DeepMind
Machine LearningProgram SynthesisProgramming LanguagesRoboticsReinforcement Learning
G
Gabriel Barth-Maron
Google
W
William Wong
Google
Rishabh Joshi
Rishabh Joshi
Google Deepmind, ex Brain Team
Language Technologies
R
Rahma Chaabouni
Google
D
Deeni Fatiha
Google
Arun Ahuja
Arun Ahuja
Google Deepmind, Mount Sinai School of Medicine, Northwestern University
Ruibo Liu
Ruibo Liu
RS @Google DeepMind
ASI
Yunxuan Li
Yunxuan Li
Google, California Institute of Technology
PhysicsArtificial IntelligenceNatural Language Processing
S
Sarah Cogan
Google
J
Jeremy Chen
Google
Chao Jia
Chao Jia
Google Deepmind
Deep LearningComputer Vision
Chenjie Gu
Chenjie Gu
DeepMind
Machine LearningDeep LearningDynamical SystemsModel Order ReductionNumerical Simulation and Optimization
Q
Qiao Zhang
Google
J
Jordan Grimstad
Google
A
Ale Jakse Hartman
Google
Martin Chadwick
Martin Chadwick
DeepMind
HippocampusAIReinforcement Learning
Gaurav Singh Tomar
Gaurav Singh Tomar
Google Deepmind
Natural Language processingLanguage Technologies and Artificial Intelligence in Education
Xavier Garcia
Xavier Garcia
Google
E
Evan Senter
Google
E
E. Taropa
Google
T
Thanumalayan Sankaranarayana Pillai
Google
J
J. Devlin
Google
M
Michael Laskin
Google
D
Diego de Las Casas
Google
D
Dasha Valter
Google
C
Connie Tao
Google
L
Lorenzo Blanco
Google
Adrià Puigdomènech Badia
Adrià Puigdomènech Badia
Google
David Reitter
David Reitter
Google DeepMind
Generative AIDialogueCognitive ScienceComputational Psycholinguistics
M
Mianna Chen
Google
J
Jenny Brennan
Google
C
Clara Rivera
Google
S
Sergey Brin
Google
Shariq Iqbal
Shariq Iqbal
Research Scientist, Deepmind
Reinforcement LearningMachine Learning
G
G. Surita
Google
J
Jane Labanowski
Google
A
Abhishek Rao
Google
S
Stephanie Winkler
Google
Emilio Parisotto
Emilio Parisotto
Carnegie Mellon University
Yiming Gu
Yiming Gu
Google
Artificial IntelligenceMachine LearningTransportation Engineering
K
Kate Olszewska
Google
Y
Yujing Zhang
Google
Ravichandra Addanki
Ravichandra Addanki
Google
Antoine Miech
Antoine Miech
Google DeepMind
Computer Vision
Annie Louis
Annie Louis
Google Deepmind, London
Natural language processingMachine Learning
Laurent El Shafey
Laurent El Shafey
Google LLC
machine learningcomputer visionbiometrics
D
Denis Teplyashin
Google
G
Geoff Brown
Google
E
Elliot Catt
Google
N
Nithya Attaluri
Google
Jan Balaguer
Jan Balaguer
Deepmind
reinforcement learningcomputational neuroscience
J
Jackie Xiang
Google
P
Pidong Wang
Google
Z
Zoe C. Ashwood
Google
A
Anton Briukhov
Google
A
Albert Webson
Google
S
Sanjay Ganapathy
Google
S
Smit Sanghavi
Google
Ajay Kannan
Ajay Kannan
Google
Machine LearningDeep Neural NetworksAI
M
Ming-Wei Chang
Google
A
Axel Stjerngren
Google
Josip Djolonga
Josip Djolonga
Google
Machine learning
Y
Yuting Sun
Google
Ankur Bapna
Ankur Bapna
Google Deepmind
Machine LearningNatural Language Processing
M
Matthew Aitchison
Google
P
Pedram Pejman
Google
H
H. Michalewski
Google
Tianhe Yu
Tianhe Yu
Google DeepMind
Machine LearningFoundation ModelsReinforcement LearningRobotics
Cindy Wang
Cindy Wang
Medical Student at Columbia VP&S
machine learningbiomedical informaticspsychiatryneonatology
J
J Christopher Love
Google
Junwhan Ahn
Junwhan Ahn
Google DeepMind
Distributed SystemsMachine LearningComputer ArchitectureDatacenters
D
Dawn Bloxwich
Google
Kehang Han
Kehang Han
Google DeepMind
LanguageReliabilityChemistry
Peter Humphreys
Peter Humphreys
DeepMind
Thibault Sellam
Thibault Sellam
Google Research
Natural Language ProcessingNatural Language Generation
James Bradbury
James Bradbury
Google
V
Varun Godbole
Google
Sina Samangooei
Sina Samangooei
Google
B
Bogdan Damoc
Google
A
Alex Kaskasoli
Google
Sébastien M. R. Arnold
Sébastien M. R. Arnold
Google DeepMind
Meta-Learning
Vijay Vasudevan
Vijay Vasudevan
Google
S
Shubham Agrawal
Google
Jason Riesa
Jason Riesa
Google Inc
Natural Language ProcessingMachine TranslationMachine Learning
Dmitry Lepikhin
Dmitry Lepikhin
Google
R
Richard Tanburn
Google
S
S. Srinivasan
Google
Hyeontaek Lim
Hyeontaek Lim
Google DeepMind
Distributed SystemsNetworkingOperating SystemsMachine Learning
S
Sarah Hodkinson
Google
Pranav Shyam
Pranav Shyam
Google DeepMind
Artificial IntelligenceDeep LearningReinforcement LearningBayesian Statistics
Johan Ferret
Johan Ferret
Research Scientist, Google DeepMind
Reinforcement LearningMachine LearningArtificial Intelligence
Steven Hand
Steven Hand
Cambridge
Operating SystemsNetworkingSystem Security
A
Ankush Garg
Google
T
T. Paine
Google
J
Jian Li
Google
Yujia Li
Yujia Li
Research Scientist, Google DeepMind
Machine LearningComputer VisionNatural Language ProcessingOptimization
M
Minh Giang
Google
Alexander Neitz
Alexander Neitz
Max Planck Institute for Intelligent System
Deep LearningMachine LearningReinforcement Learning
Zaheer Abbas
Zaheer Abbas
Research Engineer, Google DeepMind
Artificial IntelligenceMachine LearningReinforcement Learning
S
Sarah York
Google
Machel Reid
Machel Reid
Research Scientist, Google DeepMind
natural language processingmachine learning
E
Elizabeth Cole
Google
Aakanksha Chowdhery
Aakanksha Chowdhery
Google
D
Dipanjan Das
Google