Controlling Thinking Speed in Reasoning Models

📅 2025-07-04

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Large reasoning models (LRMs) face a fundamental trade-off between rapid response and deep reasoning, resulting in high computational overhead and latency. To address this, we propose a training-free, dynamic thinking-speed control method. Our approach explicitly models “fast/slow thinking” states in representation space via directed guidance vectors and enables online switching of reasoning paths based on real-time task difficulty estimation. Integrated into the vLLM framework, it performs lightweight inference policy adaptation at test time through prompt signals and representation editing—without architectural modification or fine-tuning. Evaluated across mainstream LRMs and multiple benchmarks, our method achieves an average accuracy gain of 1.3% while reducing token consumption by 8.6%, significantly improving the accuracy–efficiency trade-off. Key contributions include: (i) the first explicit representation-space modeling of cognitive speed states via guidance vectors; (ii) real-time, difficulty-aware reasoning-path selection; and (iii) a plug-and-play, zero-shot inference optimization framework compatible with existing LRM deployments.

Technology Category

Application Category

📝 Abstract

Human cognition is theorized to operate in two modes: fast, intuitive System 1 thinking and slow, deliberate System 2 thinking. While current Large Reasoning Models (LRMs) excel at System 2 thinking, their inability to perform fast thinking leads to high computational overhead and latency. In this work, we enable LRMs to approximate human intelligence through dynamic thinking speed adjustment, optimizing accuracy-efficiency trade-offs. Our approach addresses two key questions: (1) how to control thinking speed in LRMs, and (2) when to adjust it for optimal performance. For the first question, we identify the steering vector that governs slow-fast thinking transitions in LRMs' representation space. Using this vector, we achieve the first representation editing-based test-time scaling effect, outperforming existing prompt-based scaling methods. For the second question, we apply real-time difficulty estimation to signal reasoning segments of varying complexity. Combining these techniques, we propose the first reasoning strategy that enables fast processing of easy steps and deeper analysis for complex reasoning. Without any training or additional cost, our plug-and-play method yields an average +1.3% accuracy with -8.6% token usage across leading LRMs and advanced reasoning benchmarks. All of our algorithms are implemented based on vLLM and are expected to support broader applications and inspire future research.

Problem

Research questions and friction points this paper is trying to address.

Enable LRMs to adjust thinking speed dynamically

Control slow-fast thinking transitions in LRMs

Optimize accuracy-efficiency trade-offs in reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic thinking speed adjustment in LRMs

Representation editing-based test-time scaling

Real-time difficulty estimation for reasoning

🔎 Similar Papers

No similar papers found.