INTELLECT-3: Technical Report

📅 2025-12-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of efficiently training large-scale Mixture-of-Experts (MoE) models while achieving strong generalization on mathematical reasoning, code generation, scientific problem-solving, and complex reasoning tasks. We propose a 106B-parameter MoE model with only 12B activated parameters per forward pass, trained via a custom end-to-end reinforcement learning (RL) paradigm. Our approach leverages Prime-RL—an open-source, asynchronous RL framework we introduce, supporting thousand-GPU scalability and multi-turn interactive training—integrated with a curated Verifier environment library and the GLM-4.5-Air-Base foundation model. The resulting model achieves state-of-the-art performance on multiple authoritative benchmarks at comparable parameter counts, outperforming larger frontier models despite its modest active parameter count. We fully open-source the model weights, RL training infrastructure, reproducible training recipes, and diverse verification environments—establishing a systematic, transparent, and scalable foundation for open MoE-RL research.

Technology Category

Application Category

📝 Abstract
We present INTELLECT-3, a 106B-parameter Mixture-of-Experts model (12B active) trained with large-scale reinforcement learning on our end-to-end RL infrastructure stack. INTELLECT-3 achieves state of the art performance for its size across math, code, science and reasoning benchmarks, outperforming many larger frontier models. We open-source the model together with the full infrastructure stack used to create it, including RL frameworks, complete recipe, and a wide collection of environments, built with the verifiers library, for training and evaluation from our Environments Hub community platform. Built for this effort, we introduce prime-rl, an open framework for large-scale asynchronous reinforcement learning, which scales seamlessly from a single node to thousands of GPUs, and is tailored for agentic RL with first-class support for multi-turn interactions and tool use. Using this stack, we run both SFT and RL training on top of the GLM-4.5-Air-Base model, scaling RL training up to 512 H200s with high training efficiency.
Problem

Research questions and friction points this paper is trying to address.

Develop a 106B-parameter Mixture-of-Experts model for math, code, science, and reasoning
Open-source the full infrastructure stack, including RL frameworks and environments
Introduce a scalable RL framework for agentic training with multi-turn interactions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture-of-Experts model with large-scale reinforcement learning
Open-source infrastructure stack including RL frameworks and environments
Prime-rl framework for scalable asynchronous agentic RL
🔎 Similar Papers
No similar papers found.
P
Prime Intellect Team
M
Mika Senghaas
F
Fares Obeid
Sami Jaghouar
Sami Jaghouar
Research Engineer
distributed training
W
William Brown
J
Jack Min Ong
Daniel Auras
Daniel Auras
Founding AI Engineer, ellamind
M
Matej Sirovatka
J
Jannik Straube
Andrew Baker
Andrew Baker
Assistant Professor, Berkeley Law School
Corporate Governance
S
Sebastian Müller
J
Justus Mattern
M
Manveer Basra
A
Aiman Ismail
D
Dominik Scherm
C
Cooper Miller
A
Ameen Patel
S
Simon Kirsten
M
Mario Sieg
C
Christian Reetz
K
Kemal Erdem
V
Vincent Weisser
J
Johannes Hagemann
Prime Intellect, Inc.