Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

📅 2025-11-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conventional wisdom holds that small language models (SLMs) are inherently limited in mathematical reasoning capability due to their restricted parameter count. Method: This paper introduces VibeThinker-1.5B—a 1.5-billion-parameter model—grounded in the Spectrum-to-Signal Principle (SSP), which jointly optimizes reasoning through two synergistic mechanisms: (i) two-stage diversity-aware distillation for data-level knowledge transfer, and (ii) maximum-entropy-guided policy optimization for strategy-level refinement. Contribution/Results: VibeThinker-1.5B achieves state-of-the-art mathematical reasoning performance on AIME24/25, HMMT25, and LiveCodeBench v6—surpassing DeepSeek R1 (400× larger) and Magistral Medium, with a LiveCodeBench score of 51.1. Crucially, it attains this with only $7,800 in training cost, drastically reducing computational requirements and establishing a new paradigm for efficient, high-performance reasoning in compact models.

Technology Category

Application Category

📝 Abstract
Challenging the prevailing consensus that small models inherently lack robust reasoning, this report introduces VibeThinker-1.5B, a 1.5B-parameter dense model developed via our Spectrum-to-Signal Principle (SSP). This challenges the prevailing approach of scaling model parameters to enhance capabilities, as seen in models like DeepSeek R1 (671B) and Kimi k2 (>1T). The SSP framework first employs a Two-Stage Diversity-Exploring Distillation (SFT) to generate a broad spectrum of solutions, followed by MaxEnt-Guided Policy Optimization (RL) to amplify the correct signal. With a total training cost of only $7,800, VibeThinker-1.5B demonstrates superior reasoning capabilities compared to closed-source models like Magistral Medium and Claude Opus 4, and performs on par with open-source models like GPT OSS-20B Medium. Remarkably, it surpasses the 400x larger DeepSeek R1 on three math benchmarks: AIME24 (80.3 vs. 79.8), AIME25 (74.4 vs. 70.0), and HMMT25 (50.4 vs. 41.7). This is a substantial improvement over its base model (6.7, 4.3, and 0.6, respectively). On LiveCodeBench V6, it scores 51.1, outperforming Magistral Medium's 50.3 and its base model's 0.0. These findings demonstrate that small models can achieve reasoning capabilities comparable to large models, drastically reducing training and inference costs and thereby democratizing advanced AI research.
Problem

Research questions and friction points this paper is trying to address.

Challenging the belief that small models lack robust reasoning capabilities
Developing efficient methods to enhance reasoning in small parameter models
Reducing training costs while achieving large-model level performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-Stage Diversity-Exploring Distillation for solution generation
MaxEnt-Guided Policy Optimization to amplify correct signals
Spectrum-to-Signal Principle framework reduces training costs
S
Sen Xu
Sina Weibo Inc.
Y
Yi Zhou
Sina Weibo Inc.
W
Wei Wang
Sina Weibo Inc.
J
Jixin Min
Sina Weibo Inc.
Z
Zhibin Yin
Sina Weibo Inc.
Y
Yingwei Dai
Sina Weibo Inc.
S
Shixi Liu
Sina Weibo Inc.
L
Lianyu Pang
Sina Weibo Inc.
Yirong Chen
Yirong Chen
Stanford University
J
Junlin Zhang
Sina Weibo Inc.