SAT: Balancing Reasoning Accuracy and Efficiency with Stepwise Adaptive Thinking

📅 2026-04-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge that large reasoning models often generate excessively lengthy reasoning chains due to “overthinking,” making it difficult to compress output while preserving logical coherence. To tackle this, the authors propose Stepwise Adaptive Thinking (SAT), a novel framework that enables step-level adaptive control of reasoning for the first time. SAT models the reasoning process as a finite state machine and employs a lightweight process reward model to dynamically select among inference modes—slow, normal, fast, or skip—based on local difficulty, thereby achieving difficulty-aware, progressive pruning. Extensive experiments across nine large language models and seven benchmarks demonstrate that SAT reduces reasoning token usage by 40% on average while consistently maintaining or even improving accuracy.
📝 Abstract
Large Reasoning Models (LRMs) have revolutionized complex problem-solving, yet they exhibit a pervasive "overthinking", generating unnecessarily long reasoning chains. While current solutions improve token efficiency, they often sacrifice fine-grained control or risk disrupting the logical integrity of the reasoning process. To address this, we introduce Stepwise Adaptive Thinking (SAT), a framework that performs step-level, difficulty-aware pruning while preserving the core reasoning structure. SAT formulates reasoning as a Finite-State Machine (FSM) with distinct thinking modes (Slow, Normal, Fast, Skip). It navigates these states dynamically using a lightweight Process Reward Model (PRM), compressing easy steps while preserving depth for hard ones. Experiments across 9 LRMs and 7 benchmarks show that SAT achieves up to 40% reduction in reasoning tokens while generally maintaining or improving accuracy.
Problem

Research questions and friction points this paper is trying to address.

overthinking
reasoning efficiency
Large Reasoning Models
logical integrity
token efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Stepwise Adaptive Thinking
Finite-State Machine
Process Reward Model
Reasoning Efficiency
Large Reasoning Models
🔎 Similar Papers
No similar papers found.
W
Weiyang Huang
Harbin Institute of Technology (Shenzhen), Shenzhen, China
Xuefeng Bai
Xuefeng Bai
Harbin Institute of Technology (Shenzhen)
Natural language processingSemanticsDialogue
Kehai Chen
Kehai Chen
Harbin Institute of Technolgy (Shenzhen)
LLMNatural Language ProcessingAgentMulti-model Generation
Xinyang Chen
Xinyang Chen
Associate Professor, Harbin Institute of Technology (Shenzhen)
machine learningmultimodal learningtransfer learning
Y
Yibin Chen
Huawei Technologies
W
Weili Guan
Harbin Institute of Technology (Shenzhen), Shenzhen, China
M
Min Zhang
Harbin Institute of Technology (Shenzhen), Shenzhen, China