Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts

📅 2026-02-13

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This work proposes a 3-billion-parameter universal language model that unifies strong reasoning, human preference alignment, code generation, and complex tool usage within a compact architecture. The model integrates pointwise and pairwise reward modeling to enhance both reasoning and alignment capabilities, employs a complexity-aware reinforcement learning reward mechanism to optimize code generation, and leverages multi-turn synthetic data combined with turn-level supervised training to support extended tool interactions. As the first open-source small-scale system integrating agentic behavior, code synthesis, and general-purpose reasoning, it significantly outperforms comparable models such as Nanbeige4-3B-2511 and Qwen3-4B across multiple benchmarks, and even surpasses the much larger 30B-parameter Qwen3-30B-A3B, demonstrating that small models can achieve both breadth and depth in capability.

Technology Category

Application Category

📝 Abstract

We present Nanbeige4.1-3B, a unified generalist language model that simultaneously achieves strong agentic behavior, code generation, and general reasoning with only 3B parameters. To the best of our knowledge, it is the first open-source small language model (SLM) to achieve such versatility in a single model. To improve reasoning and preference alignment, we combine point-wise and pair-wise reward modeling, ensuring high-quality, human-aligned responses. For code generation, we design complexity-aware rewards in Reinforcement Learning, optimizing both correctness and efficiency. In deep search, we perform complex data synthesis and incorporate turn-level supervision during training. This enables stable long-horizon tool interactions, allowing Nanbeige4.1-3B to reliably execute up to 600 tool-call turns for complex problem-solving. Extensive experimental results show that Nanbeige4.1-3B significantly outperforms prior models of similar scale, such as Nanbeige4-3B-2511 and Qwen3-4B, even achieving superior performance compared to much larger models, such as Qwen3-30B-A3B. Our results demonstrate that small models can achieve both broad competence and strong specialization simultaneously, redefining the potential of 3B parameter models.

Problem

Research questions and friction points this paper is trying to address.

small language model

agentic behavior

code generation

general reasoning

model versatility

Innovation

Methods, ideas, or system contributions that make the work stand out.

reward modeling

complexity-aware reinforcement learning

tool-use reasoning