AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

📅 2025-05-30

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses the rigid inference pacing of large reasoning models (LRMs) during testing. We propose AlphaOne, a framework enabling dynamic, continuous, and non-monotonic control over “slow thinking” (reasoning) and “fast answering” (generation). Its core innovation is a learnable scalar α that defines the “α-moment”: prior to this moment, reasoning tokens are sparsely inserted via a Bernoulli process to support adaptive slow thinking; thereafter, reasoning terminates dynamically and transitions to rapid token generation. AlphaOne is the first method to achieve dense, differentiable, end-to-end trainable non-monotonic scaling—unifying and generalizing existing monotonic scheduling strategies—and enables cross-task generalizable α-learning. Evaluated on mathematical, coding, and scientific reasoning benchmarks, AlphaOne significantly improves accuracy while reducing inference steps by up to 47%, all without compromising—and often enhancing—answer quality.

Technology Category

Application Category

📝 Abstract

This paper presents AlphaOne ($alpha$1), a universal framework for modulating reasoning progress in large reasoning models (LRMs) at test time. $alpha$1 first introduces $alpha$ moment, which represents the scaled thinking phase with a universal parameter $alpha$. Within this scaled pre-$alpha$ moment phase, it dynamically schedules slow thinking transitions by modeling the insertion of reasoning transition tokens as a Bernoulli stochastic process. After the $alpha$ moment, $alpha$1 deterministically terminates slow thinking with the end-of-thinking token, thereby fostering fast reasoning and efficient answer generation. This approach unifies and generalizes existing monotonic scaling methods by enabling flexible and dense slow-to-fast reasoning modulation. Extensive empirical studies on various challenging benchmarks across mathematical, coding, and scientific domains demonstrate $alpha$1's superior reasoning capability and efficiency. Project page: https://alphaone-project.github.io/

Problem

Research questions and friction points this paper is trying to address.

Modulating reasoning progress in large models dynamically

Unifying slow-to-fast reasoning transitions flexibly

Enhancing reasoning efficiency across multiple domains

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic slow-fast reasoning modulation via α moment

Bernoulli process for reasoning transition scheduling

Deterministic termination with end-of-thinking token

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting