Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts

📅 2026-02-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a 3-billion-parameter universal language model that unifies strong reasoning, human preference alignment, code generation, and complex tool usage within a compact architecture. The model integrates pointwise and pairwise reward modeling to enhance both reasoning and alignment capabilities, employs a complexity-aware reinforcement learning reward mechanism to optimize code generation, and leverages multi-turn synthetic data combined with turn-level supervised training to support extended tool interactions. As the first open-source small-scale system integrating agentic behavior, code synthesis, and general-purpose reasoning, it significantly outperforms comparable models such as Nanbeige4-3B-2511 and Qwen3-4B across multiple benchmarks, and even surpasses the much larger 30B-parameter Qwen3-30B-A3B, demonstrating that small models can achieve both breadth and depth in capability.

Technology Category

Application Category

📝 Abstract
We present Nanbeige4.1-3B, a unified generalist language model that simultaneously achieves strong agentic behavior, code generation, and general reasoning with only 3B parameters. To the best of our knowledge, it is the first open-source small language model (SLM) to achieve such versatility in a single model. To improve reasoning and preference alignment, we combine point-wise and pair-wise reward modeling, ensuring high-quality, human-aligned responses. For code generation, we design complexity-aware rewards in Reinforcement Learning, optimizing both correctness and efficiency. In deep search, we perform complex data synthesis and incorporate turn-level supervision during training. This enables stable long-horizon tool interactions, allowing Nanbeige4.1-3B to reliably execute up to 600 tool-call turns for complex problem-solving. Extensive experimental results show that Nanbeige4.1-3B significantly outperforms prior models of similar scale, such as Nanbeige4-3B-2511 and Qwen3-4B, even achieving superior performance compared to much larger models, such as Qwen3-30B-A3B. Our results demonstrate that small models can achieve both broad competence and strong specialization simultaneously, redefining the potential of 3B parameter models.
Problem

Research questions and friction points this paper is trying to address.

small language model
agentic behavior
code generation
general reasoning
model versatility
Innovation

Methods, ideas, or system contributions that make the work stand out.

reward modeling
complexity-aware reinforcement learning
tool-use reasoning
small language model
turn-level supervision
🔎 Similar Papers
No similar papers found.
C
Chen Yang
Nanbeige LLM Lab, Boss Zhipin
Guangyue Peng
Guangyue Peng
Peking University
J
Jiaying Zhu
Nanbeige LLM Lab, Boss Zhipin
R
Ran Le
Nanbeige LLM Lab, Boss Zhipin
R
Ruixiang Feng
Nanbeige LLM Lab, Boss Zhipin
T
Tao Zhang
Nanbeige LLM Lab, Boss Zhipin
X
Xiyun Xu
Nanbeige LLM Lab, Boss Zhipin
Y
Yang Song
Nanbeige LLM Lab, Boss Zhipin
Y
Yiming Jia
Nanbeige LLM Lab, Boss Zhipin
Y
Yuntao Wen
Nanbeige LLM Lab, Boss Zhipin
Y
Yunzhi Xu
Nanbeige LLM Lab, Boss Zhipin
Z
Zekai Wang
Nanbeige LLM Lab, Boss Zhipin
Z
Zhenwei An
Nanbeige LLM Lab, Boss Zhipin
Z
Zhicong Sun
Nanbeige LLM Lab, Boss Zhipin
Z
Zongchao Chen
Nanbeige LLM Lab, Boss Zhipin