Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

📅 2025-11-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Conventional wisdom holds that small language models (SLMs) are inherently limited in mathematical reasoning capability due to their restricted parameter count. Method: This paper introduces VibeThinker-1.5B—a 1.5-billion-parameter model—grounded in the Spectrum-to-Signal Principle (SSP), which jointly optimizes reasoning through two synergistic mechanisms: (i) two-stage diversity-aware distillation for data-level knowledge transfer, and (ii) maximum-entropy-guided policy optimization for strategy-level refinement. Contribution/Results: VibeThinker-1.5B achieves state-of-the-art mathematical reasoning performance on AIME24/25, HMMT25, and LiveCodeBench v6—surpassing DeepSeek R1 (400× larger) and Magistral Medium, with a LiveCodeBench score of 51.1. Crucially, it attains this with only $7,800 in training cost, drastically reducing computational requirements and establishing a new paradigm for efficient, high-performance reasoning in compact models.

Technology Category

Application Category

📝 Abstract

Challenging the prevailing consensus that small models inherently lack robust reasoning, this report introduces VibeThinker-1.5B, a 1.5B-parameter dense model developed via our Spectrum-to-Signal Principle (SSP). This challenges the prevailing approach of scaling model parameters to enhance capabilities, as seen in models like DeepSeek R1 (671B) and Kimi k2 (>1T). The SSP framework first employs a Two-Stage Diversity-Exploring Distillation (SFT) to generate a broad spectrum of solutions, followed by MaxEnt-Guided Policy Optimization (RL) to amplify the correct signal. With a total training cost of only $7,800, VibeThinker-1.5B demonstrates superior reasoning capabilities compared to closed-source models like Magistral Medium and Claude Opus 4, and performs on par with open-source models like GPT OSS-20B Medium. Remarkably, it surpasses the 400x larger DeepSeek R1 on three math benchmarks: AIME24 (80.3 vs. 79.8), AIME25 (74.4 vs. 70.0), and HMMT25 (50.4 vs. 41.7). This is a substantial improvement over its base model (6.7, 4.3, and 0.6, respectively). On LiveCodeBench V6, it scores 51.1, outperforming Magistral Medium's 50.3 and its base model's 0.0. These findings demonstrate that small models can achieve reasoning capabilities comparable to large models, drastically reducing training and inference costs and thereby democratizing advanced AI research.

Problem

Research questions and friction points this paper is trying to address.

Challenging the belief that small models lack robust reasoning capabilities

Developing efficient methods to enhance reasoning in small parameter models

Reducing training costs while achieving large-model level performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-Stage Diversity-Exploring Distillation for solution generation

MaxEnt-Guided Policy Optimization to amplify correct signals

Spectrum-to-Signal Principle framework reduces training costs

🔎 Similar Papers

GraphIC: A Graph-Based In-Context Example Retrieval Model for Multi-Step Reasoning

2024-10-03arXiv.orgCitations: 0

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting

2024-10-10Citations: 0

Authors to Follow