Gemma 3 Technical Report

📅 2025-03-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Gemma 3 Technical Report addresses key limitations of lightweight open-source models: weak multimodal capabilities, high memory overhead for long-context processing, and insufficient multilingual support and reasoning performance. To tackle these challenges, the report introduces three core innovations: (1) a novel architecture with tunable ratios of local-to-global attention layers, drastically reducing KV cache memory consumption; (2) a multi-stage post-training paradigm integrating multimodal modeling, sparse attention, knowledge distillation, and instruction tuning—unifying improvements across visual understanding, 128K-context processing, mathematical reasoning, dialogue, and multilingual tasks; and (3) the fully open-sourced Gemma 3 series (4B and 27B variants), where Gemma3-4B-IT matches Gemma2-27B-IT in performance, and Gemma3-27B-IT approaches Gemini-1.5-Pro on multiple benchmarks—enabling efficient deployment in resource-constrained environments.

Technology Category

Application Category

📝 Abstract
We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achieved by increasing the ratio of local to global attention layers, and keeping the span on local attention short. The Gemma 3 models are trained with distillation and achieve superior performance to Gemma 2 for both pre-trained and instruction finetuned versions. In particular, our novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks. We release all our models to the community.
Problem

Research questions and friction points this paper is trying to address.

Enhancing vision understanding in lightweight open models
Reducing KV-cache memory for long context handling
Improving math, chat, and multilingual abilities via distillation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal lightweight model with vision understanding
Enhanced architecture reducing KV-cache memory
Distillation training improving multilingual and math abilities
🔎 Similar Papers
No similar papers found.
G
Gemma Team Aishwarya Kamath
Google DeepMind
Johan Ferret
Johan Ferret
Research Scientist, Google DeepMind
Reinforcement LearningMachine LearningArtificial Intelligence
Shreya Pathak
Shreya Pathak
Indian Institute of Technology Bombay
Computer Science
Nino Vieillard
Nino Vieillard
Google DeepMind
Reinforcement Learning
R
Ramona Merhej
Google DeepMind
S
Sarah Perrin
Google DeepMind
T
Tatiana Matejovicova
Google DeepMind
A
Alexandre Ram'e
Google DeepMind
M
Morgane Rivière
Google DeepMind
L
Louis Rouillard
Google DeepMind
Thomas Mesnard
Thomas Mesnard
Research Scientist at Google DeepMind
LLMReinforcement LearningArtificial Intelligence
G
Geoffrey Cideron
Google DeepMind
J
Jean-Bastien Grill
Google DeepMind
Sabela Ramos
Sabela Ramos
Software Engineer. Google.
Reinforcement LearningHigh Performance ComputingShared Memory ProgrammingAlgorithm OptimizationCloud Computing
E
Edouard Yvinec
Google DeepMind
M
Michelle Casbon
Google DeepMind
E
Etienne Pot
Google DeepMind
I
Ivo Penchev
Google DeepMind
G
Gael Liu
Google DeepMind
Francesco Visin
Francesco Visin
Senior Research Scientist at Google DeepMind
Model based Reinforcement Learning
K
Kathleen Kenealy
Google DeepMind
Lucas Beyer
Lucas Beyer
Meta, OpenAI, Google DeepMind, Google Brain, RWTH Aachen
Representation LearningGood ShitComputer VisionRobotics
X
Xiaohai Zhai
Google DeepMind
Anton Tsitsulin
Anton Tsitsulin
Research Scientist, Google Research
graph machine learningdata-centric machine learning
R
R. Busa-Fekete
Google DeepMind
A
Alex Feng
Google DeepMind
Noveen Sachdeva
Noveen Sachdeva
Research Scientist, Google DeepMind
Machine Learning
Benjamin Coleman
Benjamin Coleman
Google DeepMind
Machine LearningData Structures and Algorithms
Y
Yi Gao
Google DeepMind
Basil Mustafa
Basil Mustafa
Google Deepmind
Machine LearningComputer VisionMultimodalityUncertainty QuantificationAI for Health
I
Iain Barr
Google DeepMind
Emilio Parisotto
Emilio Parisotto
Carnegie Mellon University
D
David Tian
Google DeepMind
Matan Eyal
Matan Eyal
Google Research
Natural Language Processing
Colin Cherry
Colin Cherry
Google Research
Natural Language ProcessingComputational LinguisticsMachine Translation
Jan-Thorsten Peter
Jan-Thorsten Peter
Google
Neural Machine Translation
D
Danila Sinopalnikov
Google DeepMind
S
Surya Bhupatiraju
Google DeepMind
Rishabh Agarwal
Rishabh Agarwal
Meta, ex DeepMind, Google Brain
Reinforcement LearningDeep LearningArtificial Intelligence
Mehran Kazemi
Mehran Kazemi
Staff Research Scientist, Google DeepMind
Machine LearningLarge Language ModelsReasoningArtificial General Intelligence
Dan Malkin
Dan Malkin
Google Research
Natural Language ProcessingComputational Linguistics
Ravin Kumar
Ravin Kumar
Bachelor of Technology in Computer Science and Engineering
artificial intelligencedeep learningalgorithm designeconomicsmathematics
David Vilar
David Vilar
Staff Research Scientist, Google
Machine TranslationMachine Learning
I
Idan Brusilovsky
Google DeepMind
Jiaming Luo
Jiaming Luo
Shanghai Jiao Tong University
Dialogue SystemDigital Mental Health
A
Abe Friesen
Google DeepMind
A
Abhanshu Sharma
Google DeepMind
Abheesht Sharma
Abheesht Sharma
Machine Learning, Keras, Google
Computational LinguisticsAd Fraud DetectionComputer VisionDeep LearningMachine Learning
A
Adi Mayrav Gilady
Google DeepMind
A
Adrian Goedeckemeyer
Google DeepMind
A
Alaa Saade
Google DeepMind
Alexander Kolesnikov
Alexander Kolesnikov
Meta; Previously: OpenAI, Google Brain / Deepmind, IST Austria
AIMachine learningDeep learningComputer vision
A
Alexei Bendebury
Google DeepMind
A
Alvin Abdagic
Google DeepMind
A
Amit Vadi
Google DeepMind
A
Andr'as Gyorgy
Google DeepMind
A
Andr'e Susano Pinto
Google DeepMind
A
Anil Das
Google DeepMind
Ankur Bapna
Ankur Bapna
Google Deepmind
Machine LearningNatural Language Processing
Antoine Miech
Antoine Miech
Google DeepMind
Computer Vision
Antoine Yang
Antoine Yang
Google DeepMind
Computer VisionMachine LearningDeep LearningVision and Language
A
Antonia Paterson
Google DeepMind
Ashish Shenoy
Ashish Shenoy
Meta
NLPSpeechComputer VisionMultimodal LLMsFederated Learning
Ayan Chakrabarti
Ayan Chakrabarti
Google DeepMind
Computer VisionMachine LearningComputational Photography
Bilal Piot
Bilal Piot
Google Deepmind
reinforcement learninginverse reinforcement learning
B
Bo Wu
Google DeepMind
Bobak Shahriari
Bobak Shahriari
Google DeepMind
Post-training LLM | Gemini Diffusion | Gemma | Acme | Reinforcement learning | Bayesian optimization
B
Bryce Petrini
Google DeepMind
C
Charlie Chen
Google DeepMind
Charline Le Lan
Charline Le Lan
Google DeepMind, Research Scientist
Reinforcement LearningDeep LearningMachine LearningArtificial Intelligence
Christopher A. Choquette-Choo
Christopher A. Choquette-Choo
OpenAI
machine learningtrustworthy machine learningdata privacyadversarial machine learningsecurity
CJ Carey
CJ Carey
Google
Machine Learning
Daniel Deutsch
Daniel Deutsch
University of Pennsylvania
natural language processingmachine learning
D
Danielle Eisenbud
Google DeepMind
D
Dee Cattle
Google DeepMind
D
Derek Cheng
Google DeepMind
Dimitris Paparas
Dimitris Paparas
Google Research
Theoretical Computer ScienceMachine LearningLarge Language Models
D
Divyashree Shivakumar Sreepathihalli
Google DeepMind
D
Doug Reid
Google DeepMind
Dustin Tran
Dustin Tran
Research Scientist, Google
Artificial IntelligenceMachine LearningStatisticsDeep Learning
D
Dustin Zelle
Google DeepMind
E
Eric Noland
Google DeepMind
E
Erwin Huizenga
Google DeepMind
Frederick Liu
Frederick Liu
Google
G
Glenn Cameron
Google DeepMind
Hadi Hashemi
Hadi Hashemi
Google DeepMind
H
Hanna Klimczak-Pluci'nska
Google DeepMind
Harman Singh
Harman Singh
CS PhD @ UC Berkeley Prev: DeepMind, Meta (FAIR)
ReasoningNatural Language ProcessingMultimodality
Harsh Mehta
Harsh Mehta
Google DeepMind
H
Harshal Tushar Lehri
Google DeepMind
Hussein Hazimeh
Hussein Hazimeh
OpenAI
Machine LearningOptimizationHigh-dimensional Statistics
I
Ian Ballantyne
Google DeepMind
Idan Szpektor
Idan Szpektor
Google Research
NLPGenerative LLMsFactuality&Grounding
I
Ivan Nardini
Google DeepMind
J
Jean Pouget-Abadie
Google DeepMind
J
Jetha Chan
Google DeepMind
J
Joe Stanton
Google DeepMind
J
J. Michael Wieting
Google DeepMind
J
Jonathan Lai
Google DeepMind
J
Jordi Orbay
Google DeepMind
J
Joseph Fernandez
Google DeepMind
J
Joshua Newlan
Google DeepMind
J
Ju-yeong Ji
Google DeepMind
J
Jyotinder Singh
Google DeepMind
K
Kat Black
Google DeepMind
K
Kathy Yu
Google DeepMind
K
Kevin Hui
Google DeepMind
K
Kiran Vodrahalli
Google DeepMind
Klaus Greff
Klaus Greff
Research Scientist at Google Brain
Machine LearningNeural Networks
L
Linhai Qiu
Google DeepMind
M
Marcella Valentine
Google DeepMind
M
Marina Coelho
Google DeepMind
M
Marvin Ritter
Google DeepMind
M
Matt Hoffman
Google DeepMind
M
Matthew Watson
Google DeepMind
M
Mayank Chaturvedi
Google DeepMind
M
Michael Moynihan
Google DeepMind
M
Min Ma
Google DeepMind
N
Nabila Babar
Google DeepMind
Natasha Noy
Natasha Noy
Google
dataset searchsemantic webstructured dataartificial intelligence
N
Nathan Byrd
Google DeepMind
N
Nick Roy
Google DeepMind
Nikola Momchev
Nikola Momchev
Google
N
Nilay Chauhan
Google DeepMind
O
Oskar Bunyan
Google DeepMind
P
Pankil Botarda
Google DeepMind
P
Paul Caron
Google DeepMind
P
Phil Culliton
Google DeepMind
P
Philipp Schmid
Google DeepMind
Pier Giuseppe Sessa
Pier Giuseppe Sessa
Google DeepMind
machine learninggame theory
P
Pingmei Xu
Google DeepMind
R
Rakesh Shivanna
Google DeepMind
Renjie Wu
Renjie Wu
University of California, Riverside
Time Series AnalysisData MiningMachine Learning
R
Renke Pan
Google DeepMind
R
Rob Willoughby
Google DeepMind
R
Rohith Vallu
Google DeepMind
R
Ryan Mullins
Google DeepMind
S
Sammy Jerome
Google DeepMind
S
Sara Smoot
Google DeepMind
S
Sertan Girgin
Google DeepMind
Shariq Iqbal
Shariq Iqbal
Research Scientist, Deepmind
Reinforcement LearningMachine Learning
Shashir Reddy
Shashir Reddy
Engineer, Google, Inc.
S
Shruti Sheth
Google DeepMind
S
Siim Põder
Google DeepMind
S
Sijal Bhatnagar
Google DeepMind
S
Sivan Eiger
Google DeepMind
Susan Zhang
Susan Zhang
FAIR
T
Tianqi Liu
Google DeepMind
T
Trevor Yacovone
Google DeepMind
U
Uday Kalra
Google DeepMind
Utku Evci
Utku Evci
Researcher @Google Deepmind
Deep Learning
Vedant Misra
Vedant Misra
DeepMind
applied mathematicsdeep learningartificial intelligencemachine learningtheoretical physics
V
Vincent Roseberry
Google DeepMind
Vladimir Feinberg
Vladimir Feinberg
Google DeepMind, Senior Staff SWE
machine learning
V
Vlad Kolesnikov
Google DeepMind
Woohyun Han
Woohyun Han
Google DeepMind
Efficient MLMachine LearningML Optimization
Woosuk Kwon
Woosuk Kwon
PhD student, UC Berkeley
Machine LearningSystems
X
Xi Chen
Google DeepMind
Yinlam Chow
Yinlam Chow
Research Scientist, Google Research
Reinforcement learningOptimal ControlSequential Decision MakingRobust ControlNonlinear Systems
Y
Yuvein Zhu
Google DeepMind
Z
Zichuan Wei
Google DeepMind