Magistral

📅 2025-06-12
📈 Citations: 0
✹ Influential: 0
📄 PDF
đŸ€– AI Summary
This work investigates end-to-end training of large language models (LLMs) under a pure reinforcement learning (RL) paradigm, aiming to preserve and enhance multimodal understanding, instruction following, and function-calling capabilities—using text-only data. To this end, we design a scalable, fully self-contained RL training pipeline built entirely on custom infrastructure, without reliance on external RL trajectories or knowledge distillation data. We introduce the first text-driven RL training method enabling fine-grained, language-guided control over reasoning steps. Leveraging Mistral Medium 3 as the base model, we devise cold-start dataset construction and policy optimization algorithms. We release Magistral Medium—a pure-RL inference model—and open-source Magistral Small—including the cold-start dataset—under the Apache 2.0 license. Experiments demonstrate substantial gains over supervised fine-tuning baselines on complex reasoning benchmarks.

Technology Category

Application Category

📝 Abstract
We introduce Magistral, Mistral's first reasoning model and our own scalable reinforcement learning (RL) pipeline. Instead of relying on existing implementations and RL traces distilled from prior models, we follow a ground up approach, relying solely on our own models and infrastructure. Notably, we demonstrate a stack that enabled us to explore the limits of pure RL training of LLMs, present a simple method to force the reasoning language of the model, and show that RL on text data alone maintains most of the initial checkpoint's capabilities. We find that RL on text maintains or improves multimodal understanding, instruction following and function calling. We present Magistral Medium, trained for reasoning on top of Mistral Medium 3 with RL alone, and we open-source Magistral Small (Apache 2.0) which further includes cold-start data from Magistral Medium.
Problem

Research questions and friction points this paper is trying to address.

Develop scalable RL pipeline for LLM training
Explore pure RL training limits for LLMs
Maintain capabilities with RL on text data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scalable pure RL training pipeline
Method to force reasoning language
RL maintains initial model capabilities
🔎 Similar Papers
No similar papers found.
M
M. Rastogi
Albert Q. Jiang
Albert Q. Jiang
Mistral AI
Language ModelsTheorem ProvingAutoformalization
Andy Lo
Andy Lo
Researcher, Mistral AI
Machine Learning
Gabrielle Berrada
Gabrielle Berrada
Unknown affiliation
G
G. Lample
J
Jason Rute
J
Joep Barmentlo
Karmesh Yadav
Karmesh Yadav
Ph.D. Student at Georgia Tech
Reinforcement LearningEmbodied AI
K
Kartikay Khandelwal
Khyathi Raghavi Chandu
Khyathi Raghavi Chandu
Allen Institute for AI
Natural Language ProcessingMultimodalityDialog ProcessingQuestion AnsweringCode-Mixing
L
Léonard Blier
L
Lucile Saulnier
M
Matthieu Dinot
M
Maxime Darrin
N
Neha Gupta
R
Roman Soletskyi
S
S. Vaze
T
Teven Le Scao
Y
Yihan Wang
A
Adam Yang
Alexander H. Liu
Alexander H. Liu
Massachusetts Institute of Technology
Alexandre Sablayrolles
Alexandre Sablayrolles
Meta AI
Amélie Héliou
Amélie Héliou
Unknown affiliation
LLMsRLCausalityFairness
A
Amélie S. Martin
A
Andrew Ehrenberg
Anmol Agarwal
Anmol Agarwal
Carnegie Mellon University
Machine LearningInformation RetrievalML4CodeArtificial Intelligence
A
Antoine Roux
A
Arthur Darcet
A
Arthur Mensch
B
Baptiste Bout
B
B. Roziere
B
Baudouin De Monicault
C
Chris Bamford
C
Christian Wallenwein
C
Christophe Renaudin
C
Clémence Lanfranchi
D
Darius Dabert
D
Devon Mizelle
D
Diego de las Casas
E
Elliot Chane-Sane
E
Emilie Fugier
E
Emma Bou Hanna
G
G. Delerce
G
Gauthier Guinet
G
G. Novikov
Guillaume Martin
Guillaume Martin
ISEM UMR 5554 , Université Montpellier - CNRS - IRD
Evolutionary BiologyTheoretical Population GeneticsMutation
H
Himanshu Jaju
Jan Ludziejewski
Jan Ludziejewski
Mistral AI
scaling lawsmixture of expertslarge language models
J
Jean-Hadrien Chabran
J
Jean-Malo Delignon
J
Joachim Studnia
J
Josselin Somerville Roberts
Julien Denize
Julien Denize
Université rennes 2
Télédétectionagriculture
K
Karan Saxena
Kush Jain
Kush Jain
Mistral AI
Large Language ModelsNatural Language ProcessingSoftware Testing
Lingxiao Zhao
Lingxiao Zhao
Mistral AI
Machine LearningGenerative ModelLLMs
Louis Martin
Louis Martin
Mistral AI (ex. FAIR)
Large Language ModelsNatural Language ProcessingText simplification
Luyu Gao
Luyu Gao
Carnegie Mellon University
Information RetrievalNatural Language Processing
L
Lélion Renard Lavaud
M
M. Pellat
M
Mathilde Guillaumin
M
Mathis Felardos
M
M. Augustin
M
Mickael Seznec
N
N. Raghuraman
Olivier Duchenne
Olivier Duchenne
Meta AI
Machine Learning / Computer Vision
P
Patricia Wang
P
P. Platen
P
Patryk Saffer
Paul Jacob
Paul Jacob
Athlone Institute of Technology
Data MiningData ScienceCyber Security
P
P. Wambergue
P
Paula Kurylowicz
P
Pavankumar Reddy Muddireddy
P
Philomene Chagniot
Pierre Stock
Pierre Stock
Mistral AI
Machine LearningDeep Learning
P
Pravesh Agrawal
R
Romain Sauvestre
R
Remi Delacourt
Sanchit Gandhi
Sanchit Gandhi
Research Scientist, Mistral AI
S
S. Subramanian
S
Shashwat Dalal
Siddharth Gandhi
Siddharth Gandhi
Mistral AI
Deep LearningInformation RetrievalNatural Language Processing
S
Soham Ghosh
S
Srijan Mishra
S
Sumukh K. Aithal
S
S. Antoniak
T
Thibault Schueller
T
Thibaut Lavril
Thomas Robert
Thomas Robert
Univ. Lyon - Univ. Eiffel
biomechanicshuman motionequilibriummotor control
T
Thomas Wang
Timothée Lacroix
Timothée Lacroix
Facebook AI Research
Tensor CompletionKnowledge BasesMachine Learning
V
Valeriia Nemychnikova
V
Victor Paltz
V
Virgile Richard
Wen-Ding Li
Wen-Ding Li
Cornell University
Machine Learning
W
William Marshall
X
Xuanyu Zhang
Yunhao Tang
Yunhao Tang
Member of technical staff @ Anthropic
Reinforcement Learning