Controlling Thinking Speed in Reasoning Models

📅 2025-07-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large reasoning models (LRMs) face a fundamental trade-off between rapid response and deep reasoning, resulting in high computational overhead and latency. To address this, we propose a training-free, dynamic thinking-speed control method. Our approach explicitly models “fast/slow thinking” states in representation space via directed guidance vectors and enables online switching of reasoning paths based on real-time task difficulty estimation. Integrated into the vLLM framework, it performs lightweight inference policy adaptation at test time through prompt signals and representation editing—without architectural modification or fine-tuning. Evaluated across mainstream LRMs and multiple benchmarks, our method achieves an average accuracy gain of 1.3% while reducing token consumption by 8.6%, significantly improving the accuracy–efficiency trade-off. Key contributions include: (i) the first explicit representation-space modeling of cognitive speed states via guidance vectors; (ii) real-time, difficulty-aware reasoning-path selection; and (iii) a plug-and-play, zero-shot inference optimization framework compatible with existing LRM deployments.

Technology Category

Application Category

📝 Abstract
Human cognition is theorized to operate in two modes: fast, intuitive System 1 thinking and slow, deliberate System 2 thinking. While current Large Reasoning Models (LRMs) excel at System 2 thinking, their inability to perform fast thinking leads to high computational overhead and latency. In this work, we enable LRMs to approximate human intelligence through dynamic thinking speed adjustment, optimizing accuracy-efficiency trade-offs. Our approach addresses two key questions: (1) how to control thinking speed in LRMs, and (2) when to adjust it for optimal performance. For the first question, we identify the steering vector that governs slow-fast thinking transitions in LRMs' representation space. Using this vector, we achieve the first representation editing-based test-time scaling effect, outperforming existing prompt-based scaling methods. For the second question, we apply real-time difficulty estimation to signal reasoning segments of varying complexity. Combining these techniques, we propose the first reasoning strategy that enables fast processing of easy steps and deeper analysis for complex reasoning. Without any training or additional cost, our plug-and-play method yields an average +1.3% accuracy with -8.6% token usage across leading LRMs and advanced reasoning benchmarks. All of our algorithms are implemented based on vLLM and are expected to support broader applications and inspire future research.
Problem

Research questions and friction points this paper is trying to address.

Enable LRMs to adjust thinking speed dynamically
Control slow-fast thinking transitions in LRMs
Optimize accuracy-efficiency trade-offs in reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic thinking speed adjustment in LRMs
Representation editing-based test-time scaling
Real-time difficulty estimation for reasoning
🔎 Similar Papers
No similar papers found.
Z
Zhengkai Lin
State Key Lab of CAD&CG, Zhejiang University
Zhihang Fu
Zhihang Fu
Alibaba Cloud
Computer VisionMachine LearningLLM
Ze Chen
Ze Chen
Alibaba Group
Comuter Vision
C
Chao Chen
Alibaba Cloud
Liang Xie
Liang Xie
Wuhan University of Technology
Time Series ForecastingCross-modal Learning
W
Wenxiao Wang
School of Software Technology, Zhejiang University
Deng Cai
Deng Cai
Professor of Computer Science, Zhejiang University
Machine learningComputer visionData miningInformation retrieval
Z
Zheng Wang
Alibaba Cloud
J
Jieping Ye
Alibaba Cloud