Llama-Nemotron: Efficient Reasoning Models

📅 2025-05-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address industry demand for efficient inference and memory optimization, this work introduces an open-source heterogeneous inference model series (8B/49B/253B) featuring a novel architecture that enables dynamic switching among inference modes. Methodologically, we integrate neural architecture search (built upon Llama 3), knowledge distillation, continual pretraining, supervised fine-tuning, and large-scale reinforcement learning in a joint optimization framework. Our key contributions are: (1) the first commercially licensed, full-stack open-source inference models—accompanied by complete post-training datasets and training code; (2) competitive reasoning capability relative to DeepSeek-R1, while achieving substantial throughput gains and reduced memory footprint; and (3) unified strong reasoning performance and inference efficiency via on-demand, runtime mode switching. All models, datasets, and training frameworks are publicly released under permissive open-source licenses.

Technology Category

Application Category

📝 Abstract
We introduce the Llama-Nemotron series of models, an open family of heterogeneous reasoning models that deliver exceptional reasoning capabilities, inference efficiency, and an open license for enterprise use. The family comes in three sizes -- Nano (8B), Super (49B), and Ultra (253B) -- and performs competitively with state-of-the-art reasoning models such as DeepSeek-R1 while offering superior inference throughput and memory efficiency. In this report, we discuss the training procedure for these models, which entails using neural architecture search from Llama 3 models for accelerated inference, knowledge distillation, and continued pretraining, followed by a reasoning-focused post-training stage consisting of two main parts: supervised fine-tuning and large scale reinforcement learning. Llama-Nemotron models are the first open-source models to support a dynamic reasoning toggle, allowing users to switch between standard chat and reasoning modes during inference. To further support open research and facilitate model development, we provide the following resources: 1. We release the Llama-Nemotron reasoning models -- LN-Nano, LN-Super, and LN-Ultra -- under the commercially permissive NVIDIA Open Model License Agreement. 2. We release the complete post-training dataset: Llama-Nemotron-Post-Training-Dataset. 3. We also release our training codebases: NeMo, NeMo-Aligner, and Megatron-LM.
Problem

Research questions and friction points this paper is trying to address.

Develop efficient open-source reasoning models for enterprise use
Enhance inference throughput and memory efficiency in reasoning tasks
Introduce dynamic reasoning toggle for flexible chat and reasoning modes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Heterogeneous models with dynamic reasoning toggle
Neural architecture search for accelerated inference
Open-source under NVIDIA Open Model License
A
A. Bercovich
NVIDIA
Itay Levy
Itay Levy
Researcher, NVIDIA
Natural Language Processing
I
Izik Golan
NVIDIA
Mohammad Dabbah
Mohammad Dabbah
Group CTO, Como 1907 | SENT Entertainment
Machine LearningArtificial IntelligencePattern RecognitionComputer VisionSignal Processing
Ran El-Yaniv
Ran El-Yaniv
Professor of Computer Science, Technion - Israel Institute of Technology. Chief Scientist - Deci AI
Machine learningdeep learningfinancial modeling
Omri Puny
Omri Puny
Ph.D. student, Weizmann Institute of Science
Graph Neural NetworksGeometric Deep LearningDeep Learning
Ido Galil
Ido Galil
Technion - Israel Institute of Technology
Machine learningDeep learning
Zach Moshe
Zach Moshe
Unknown affiliation
T
Tomer Ronen
NVIDIA
Najeeb Nabwani
Najeeb Nabwani
Unknown affiliation
Deep Learning
I
Ido Shahaf
NVIDIA
O
Oren Tropp
NVIDIA
E
Ehud Karpas
NVIDIA
R
Ran Zilberstein
NVIDIA
J
Jiaqi Zeng
NVIDIA
Soumye Singhal
Soumye Singhal
NVIDIA
Deep LearningNLPArtificial Intelligence
A
Alexander Bukharin
NVIDIA
Yian Zhang
Yian Zhang
Unknown affiliation
Computer ScienceNatural Language ProcessingMachine LearningHuman Computer Interaction
T
Tugrul Konuk
NVIDIA
G
Gerald Shen
NVIDIA
A
Ameya Mahabaleshwarkar
NVIDIA
Bilal Kartal
Bilal Kartal
NVIDIA
AIDeep LearningReinforcement LearningMulti-Agent Systems
Yoshi Suhara
Yoshi Suhara
NVIDIA
Natural Language ProcessingMachine LearningComputational Social Science
Olivier Delalleau
Olivier Delalleau
NVIDIA
Artificial Intelligence
Zijia Chen
Zijia Chen
Senior Deep Learning Scientist, NVIDIA Corporation
Natural Language ProcessingArtificial IntelligenceMultimodal Model
Z
Zhilin Wang
NVIDIA
D
David Mosallanezhad
NVIDIA
A
Adi Renduchintala
NVIDIA
Haifeng Qian
Haifeng Qian
Principal Applied Scientist, NVIDIA
Electrical EngineeringComputer ScienceMathematics
D
Dima Rekesh
NVIDIA
Fei Jia
Fei Jia
NVIDIA
Somshubra Majumdar
Somshubra Majumdar
NVIDIA
Machine LearningDeep LearningComputer VisionTime SeriesSpeech Recognition
V
V. Noroozi
NVIDIA
W
W. Ahmad
NVIDIA
S
Sean Narenthiran
NVIDIA
A
Aleksander Ficek
NVIDIA
Mehrzad Samadi
Mehrzad Samadi
NVIDIA (Parabricks)
High Performance ComputingGenomicsGPUs
J
Jocelyn Huang
NVIDIA
S
Siddhartha Jain
NVIDIA
Igor Gitman
Igor Gitman
Applied Scientist, NVIDIA
Large Language ModelsMath ReasoningDeep Learning
I
I. Moshkov
NVIDIA
W
Wei Du
NVIDIA
Shubham Toshniwal
Shubham Toshniwal
Senior Research Scientist, NVIDIA
ReasoningMemoryNLP
G
George Armstrong
NVIDIA
B
B. Kisačanin
NVIDIA
Matvei Novikov
Matvei Novikov
NVIDIA
computer science
D
Daria Gitman
NVIDIA
E
E. Bakhturina
NVIDIA
J
Jane Scowcroft
NVIDIA
J
John Kamalu
NVIDIA
Dan Su
Dan Su
Tencent AI Lab
speech recognitionspeech synthesisspeaker recognition
Kezhi Kong
Kezhi Kong
NVIDIA
Machine Learning
Markus Kliegl
Markus Kliegl
NVIDIA
deep learningmachine learningartificial intelligencefluid mechanicsPDE
R
Rabeeh Karimi
NVIDIA
Y
Ying Lin
NVIDIA
S
S. Satheesh
NVIDIA
J
Jupinder Parmar
NVIDIA
Pritam Gundecha
Pritam Gundecha
NVIDIA
Brandon Norick
Brandon Norick
Microsoft
Large language models
J
Joseph Jennings
NVIDIA
Shrimai Prabhumoye
Shrimai Prabhumoye
Senior Research Scientist @NVIDIA and Adjunct Assistant Professor @Boston University
Natural Language Processing
Syeda Nahida Akter
Syeda Nahida Akter
Carnegie Mellon University
Multimodal Machine LearningLarge Language ModelVision Language ModelCommonsense Reasoning
M
M. Patwary
NVIDIA
Abhinav Khattar
Abhinav Khattar
NVIDIA
Machine LearningNatural Language ProcessingDeep Learning
Deepak Narayanan
Deepak Narayanan
NVIDIA
Computer SystemsSystems for Machine Learning
R
R. Waleffe
NVIDIA
J
Jimmy Zhang
NVIDIA
B
Bor-Yiing Su
NVIDIA
Guyue Huang
Guyue Huang
NVIDIA
Terry Kong
Terry Kong
Unknown affiliation
P
Parth Chadha
NVIDIA
S
Sahil Jain
NVIDIA
C
Christine Harvey
NVIDIA
Elad Segal
Elad Segal
NVIDIA
Natural Language UnderstandingMachine Learning
J
Jining Huang
NVIDIA
S
S. Kashirsky
NVIDIA
R
Robert Mcqueen
NVIDIA
I
Izzy Putterman
NVIDIA
G
George Lam
NVIDIA
A
Arun Venkatesan
NVIDIA
S
Sherry Wu
NVIDIA
V
Vinh Nguyen
NVIDIA
M
M. Kilaru
NVIDIA
Andrew Wang
Andrew Wang
University of Toronto, Vector Institute
AI Safety
A
Anna Warno
NVIDIA
A
Abhilash Somasamudramath
NVIDIA
S
Sandip Bhaskar
NVIDIA
M
Maka Dong
NVIDIA
N
Nave Assaf
NVIDIA
S
Shahar Mor
NVIDIA
O
Omer Ullman Argov
NVIDIA
S
Scot Junkin
NVIDIA
O
Oleksandr Romanenko
NVIDIA
Pedro Larroy
Pedro Larroy
NVIDIA
M
Monika Katariya
NVIDIA
M
Marco Rovinelli
NVIDIA
V
Viji Balas
NVIDIA
N
Nicholas Edelman
NVIDIA
A
Anahita Bhiwandiwalla
NVIDIA
M
Muthu Subramaniam
NVIDIA
S
Smita Ithape
NVIDIA
K
Karthik Ramamoorthy
NVIDIA
Y
Yuting Wu
NVIDIA
S
S. Velury
NVIDIA
O
Omri Almog
NVIDIA
J
Joyjit Daw
NVIDIA
D
Denys Fridman
NVIDIA
E
Erick Galinkin
NVIDIA
Michael Evans
Michael Evans
NVIDIA
K
Katherine Luna
NVIDIA
Leon Derczynski
Leon Derczynski
ITU Copenhagen & NVIDIA
Natural Language ProcessingMachine LearningLLM SecurityOnline Harms
N
Nikki Pope
NVIDIA
E
Eileen Long
NVIDIA
S
Seth Schneider
NVIDIA
G
Guillermo Siman
NVIDIA
T
Tomasz Grzegorzek
NVIDIA
P
Pablo Ribalta
NVIDIA
J
Joey Conway
NVIDIA
T
Trisha Saar
NVIDIA
A
Ann Guan
NVIDIA
K
Krzysztof Pawelec
NVIDIA
S
Shyamala Prayaga
NVIDIA
Oleksii Kuchaiev
Oleksii Kuchaiev
NVIDIA
machine learningdeep learninggraph theorybioinformatics
Boris Ginsburg
Boris Ginsburg
NVIDIA
Deep LearningSpeech RecognitionSpeech Synthesis
O
O. Olabiyi
NVIDIA
K
Kari Briski
NVIDIA
J
Jonathan Cohen
NVIDIA
Bryan Catanzaro
Bryan Catanzaro
NVIDIA
Parallel ComputingMachine Learning
J
Jonah Alben
NVIDIA
Yonatan Geifman
Yonatan Geifman
NVIDIA
Machine LearningDeep Learning
E
Eric Chung
NVIDIA