Dynamic Early Exit in Reasoning Models

📅 2025-04-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the efficiency degradation and accuracy loss in large reasoning language models caused by excessive reasoning during long chain-of-thought (CoT) generation, this paper proposes a training-free dynamic early-exit mechanism. The method leverages token-level confidence scores intrinsic to the model to assess and autonomously terminate redundant reasoning steps in real time—particularly at reasoning transition points (e.g., “Wait” tokens)—thereby overcoming the limitations of fixed-length truncation. Its core innovations include token-behavior monitoring, adaptive confidence modeling, and a dynamic termination policy, all natively compatible with o1-style reasoning architectures. Evaluated on four major benchmarks—including MATH-500—the approach achieves 31–43% average CoT compression while improving accuracy by 1.7–5.7 percentage points, marking the first demonstration of concurrent high accuracy and high efficiency in CoT-based reasoning.

Technology Category

Application Category

📝 Abstract
Recent advances in large reasoning language models (LRLMs) rely on test-time scaling, which extends long chain-of-thought (CoT) generation to solve complex tasks. However, overthinking in long CoT not only slows down the efficiency of problem solving, but also risks accuracy loss due to the extremely detailed or redundant reasoning steps. We propose a simple yet effective method that allows LLMs to self-truncate CoT sequences by early exit during generation. Instead of relying on fixed heuristics, the proposed method monitors model behavior at potential reasoning transition points (e.g.,"Wait"tokens) and dynamically terminates the next reasoning chain's generation when the model exhibits high confidence in a trial answer. Our method requires no additional training and can be seamlessly integrated into existing o1-like reasoning LLMs. Experiments on multiple reasoning benchmarks MATH-500, AMC 2023, GPQA Diamond and AIME 2024 show that the proposed method is consistently effective on deepseek-series reasoning LLMs, reducing the length of CoT sequences by an average of 31% to 43% while improving accuracy by 1.7% to 5.7%.
Problem

Research questions and friction points this paper is trying to address.

Reduces overthinking in long chain-of-thought generation
Dynamically truncates reasoning steps via early exit
Improves efficiency and accuracy without extra training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-truncating CoT sequences via early exit
Dynamic termination at high confidence points
No additional training, integrates seamlessly
🔎 Similar Papers
No similar papers found.
Chenxu Yang
Chenxu Yang
Institute of Information Engineering, Chinese Academy of Sciences
NLPDialogue Generation
Q
Q. Si
Huawei Technologies Co., Ltd.
Yongjie Duan
Yongjie Duan
Tencent, Tsinghua University
Computer VisionLLMEfficient InferenceMulti-Modality
Z
Zheliang Zhu
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China; School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
C
Chenyu Zhu
Huawei Technologies Co., Ltd.
Z
Zheng Lin
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China; School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
L
Li Cao
Huawei Technologies Co., Ltd.
Weiping Wang
Weiping Wang
School of Information Science and Engineering, Central South University
Computer NetworkNetwork Security