Ministral 3

📅 2026-01-13
📈 Citations: 4
Influential: 1
📄 PDF
🤖 AI Summary
This work addresses the lack of efficient, multifunctional small language models suitable for compute- and memory-constrained environments by introducing the Ministral 3 series—parameter-efficient dense models at 3B, 8B, and 14B scales. Each variant is released in three versions: base pretrained, instruction-tuned, and reasoning-optimized, with support for multimodal image understanding. The core innovation lies in a cascaded distillation approach that integrates iterative pruning, continual knowledge distillation, and multitask continued pretraining, achieving substantial gains in inference efficiency without compromising performance. Evaluated across complex reasoning and general-purpose tasks, the entire model family demonstrates strong empirical results and is released under the Apache 2.0 license.

Technology Category

Application Category

📝 Abstract
We introduce the Ministral 3 series, a family of parameter-efficient dense language models designed for compute and memory constrained applications, available in three model sizes: 3B, 8B, and 14B parameters. For each model size, we release three variants: a pretrained base model for general-purpose use, an instruction finetuned, and a reasoning model for complex problem-solving. In addition, we present our recipe to derive the Ministral 3 models through Cascade Distillation, an iterative pruning and continued training with distillation technique. Each model comes with image understanding capabilities, all under the Apache 2.0 license.
Problem

Research questions and friction points this paper is trying to address.

parameter-efficient
dense language models
compute-constrained
memory-constrained
image understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cascade Distillation
parameter-efficient
dense language models
iterative pruning
image understanding
🔎 Similar Papers
No similar papers found.
Alexander H. Liu
Alexander H. Liu
Massachusetts Institute of Technology
K
Kartik Khandelwal
Sandeep Subramanian
Sandeep Subramanian
Mistral AI
Deep LearningNatural Language Processing
V
Victor Jouault
Abhinav Rastogi
Abhinav Rastogi
Google DeepMind
Natural Language ProcessingDialogue Systems
A
Adrien Sad'e
Alan Jeffares
Alan Jeffares
University of Cambridge
Machine LearningStatistics
Albert Q. Jiang
Albert Q. Jiang
Mistral AI
Language ModelsTheorem ProvingAutoformalization
A
Alexandre Cahill
A
Alexandre Gavaudan
Alexandre Sablayrolles
Alexandre Sablayrolles
Meta AI
A
Am'elie H'eliou
A
Amos You
A
Andy Ehrenberg
Andy Lo
Andy Lo
Researcher, Mistral AI
Machine Learning
A
Anton Eliseev
A
Antonia Calvi
A
Avinash Sooriyarachchi
B
Baptiste Bout
B
Baptiste Rozière
B
Baudouin De Monicault
C
Clémence Lanfranchi
C
Corentin Barreau
C
Cyprien Courtot
Daniele Grattarola
Daniele Grattarola
Isomorphic Labs
Graph Neural NetworksMachine LearningDeep LearningArtificial Intelligence
D
Darius Dabert
D
Diego de Las Casas
E
Elliot Chane-Sane
Faruk Ahmed
Faruk Ahmed
AI Scientist, Mistral
Machine LearningArtificial Intelligence
Gabrielle Berrada
Gabrielle Berrada
Unknown affiliation
G
Gaetan Ecrepont
G
Gauthier Guinet
G
Georgii Sergeevich Novikov
G
Guillaume Kunsch
Guillaume Lample
Guillaume Lample
Mistral AI
Artificial Intelligence
Guillaume Martin
Guillaume Martin
ISEM UMR 5554 , Université Montpellier - CNRS - IRD
Evolutionary BiologyTheoretical Population GeneticsMutation
Gunshi Gupta
Gunshi Gupta
Ph.D student at University of Oxford
Reinforcement LearningMulti-modal learningMeta+Continual Learning
Jan Ludziejewski
Jan Ludziejewski
Mistral AI
scaling lawsmixture of expertslarge language models
J
Jason Rute
J
J. Studnia
J
Jonas Amar
J
Joséphine Delas
J
Josselin Somerville Roberts
Karmesh Yadav
Karmesh Yadav
Ph.D. Student at Georgia Tech
Reinforcement LearningEmbodied AI
K
K. Chandu
Kush Jain
Kush Jain
Mistral AI
Large Language ModelsNatural Language ProcessingSoftware Testing
Laurence Aitchison
Laurence Aitchison
University of Bristol
Deep Learning
L
Laurent Fainsin
L
Léonard Blier
Lingxiao Zhao
Lingxiao Zhao
Mistral AI
Machine LearningGenerative ModelLLMs
Louis Martin
Louis Martin
Mistral AI (ex. FAIR)
Large Language ModelsNatural Language ProcessingText simplification
L
Lucile Saulnier
Luyu Gao
Luyu Gao
Carnegie Mellon University
Information RetrievalNatural Language Processing
M
M. Buyl
M
Margaret Jennings
M
Marie Pellat
M
Mark Prins
M
Mathieu Poir'ee
M
Mathilde Guillaumin
M
Matthieu Dinot
M
Matthieu Futeral
M
Maxime Darrin
Maximilian Augustin
Maximilian Augustin
PhD Student, University of Tuebingen
M
Mia Chiquier
M
Michel Schimpf
Nathan Grinsztajn
Nathan Grinsztajn
Cohere
reinforcement learningLLMscombinatorial optimization
N
Neha Gupta
N
N. Raghuraman
Olivier Bousquet
Olivier Bousquet
Google
Machine LearningLearning TheoryArtificial Intelligence
Olivier Duchenne
Olivier Duchenne
Meta AI
Machine Learning / Computer Vision
P
Patricia Wang
Patrick von Platen
Patrick von Platen
Research Engineer at Mistral
Machine Learning
Paul Jacob
Paul Jacob
Athlone Institute of Technology
Data MiningData ScienceCyber Security
P
Paul Wambergue
P
Paula Kurylowicz
P
Pavankumar Reddy Muddireddy
P
Philomène Chagniot
Pierre Stock
Pierre Stock
Mistral AI
Machine LearningDeep Learning
P
Pravesh Agrawal
Q
Quentin Torroba
R
Romain Sauvestre
R
Roman Soletskyi
R
Rupert Menneer
S
S. Vaze
S
Samuel Barry
Sanchit Gandhi
Sanchit Gandhi
Research Scientist, Mistral AI
S
Siddhant Waghjale
Siddharth Gandhi
Siddharth Gandhi
Mistral AI
Deep LearningInformation RetrievalNatural Language Processing
S
Soham Ghosh
S
Srijan Mishra
S
Sumukh Aithal
Szymon Antoniak
Szymon Antoniak
Mistral AI
neural networksartificial intelligence
T
Teven Le Scao
T
Théo Cachet
T
Theo Simon Sorg
T
Thibaut Lavril
T
Thiziri Nait Saada
Thomas Chabal
Thomas Chabal
Inria, Ecole Normale Superieure/PSL Research University
computer visionrobotics
T
Thomas Foubert
Thomas Robert
Thomas Robert
Univ. Lyon - Univ. Eiffel
biomechanicshuman motionequilibriummotor control
T
Thomas Wang
Tim Lawson
Tim Lawson
PhD student, University of Bristol
language modelinginterpretability
Tom Bewley
Tom Bewley
Mistral AI
🔍 Interpretability🤖 Reinforcement Learning🎯 AI Alignment💬 Language Models
T
Tom Edwards
U
Umar Jamil
U
Umberto Tomasini
V
Valeriia Nemychnikova
V
Van Phung
V
V. Maladiere
V
Virgile Richard
W
Wassim Bouaziz
Wen-Ding Li
Wen-Ding Li
Cornell University
Machine Learning
W
William Marshall
X
Xinghui Li
X
Xinyu Yang
Y
Yassine El Ouahidi
Y
Yihan Wang
Yunhao Tang
Yunhao Tang
Member of technical staff @ Anthropic
Reinforcement Learning
Zaccharie Ramzi
Zaccharie Ramzi
Research Scientist
machine learningimage reconstruction