Dynamic Early Exit in Reasoning Models

📅 2025-04-22

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

To address the efficiency degradation and accuracy loss in large reasoning language models caused by excessive reasoning during long chain-of-thought (CoT) generation, this paper proposes a training-free dynamic early-exit mechanism. The method leverages token-level confidence scores intrinsic to the model to assess and autonomously terminate redundant reasoning steps in real time—particularly at reasoning transition points (e.g., “Wait” tokens)—thereby overcoming the limitations of fixed-length truncation. Its core innovations include token-behavior monitoring, adaptive confidence modeling, and a dynamic termination policy, all natively compatible with o1-style reasoning architectures. Evaluated on four major benchmarks—including MATH-500—the approach achieves 31–43% average CoT compression while improving accuracy by 1.7–5.7 percentage points, marking the first demonstration of concurrent high accuracy and high efficiency in CoT-based reasoning.

Technology Category

Application Category

📝 Abstract

Recent advances in large reasoning language models (LRLMs) rely on test-time scaling, which extends long chain-of-thought (CoT) generation to solve complex tasks. However, overthinking in long CoT not only slows down the efficiency of problem solving, but also risks accuracy loss due to the extremely detailed or redundant reasoning steps. We propose a simple yet effective method that allows LLMs to self-truncate CoT sequences by early exit during generation. Instead of relying on fixed heuristics, the proposed method monitors model behavior at potential reasoning transition points (e.g.,"Wait"tokens) and dynamically terminates the next reasoning chain's generation when the model exhibits high confidence in a trial answer. Our method requires no additional training and can be seamlessly integrated into existing o1-like reasoning LLMs. Experiments on multiple reasoning benchmarks MATH-500, AMC 2023, GPQA Diamond and AIME 2024 show that the proposed method is consistently effective on deepseek-series reasoning LLMs, reducing the length of CoT sequences by an average of 31% to 43% while improving accuracy by 1.7% to 5.7%.

Problem

Research questions and friction points this paper is trying to address.

Reduces overthinking in long chain-of-thought generation

Dynamically truncates reasoning steps via early exit

Improves efficiency and accuracy without extra training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-truncating CoT sequences via early exit

Dynamic termination at high confidence points

No additional training, integrates seamlessly

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting

2024-10-10Citations: 0

FiDeLiS: Faithful Reasoning in Large Language Model for Knowledge Graph Question Answering

2024-05-22arXiv.orgCitations: 5

Nvidia

30 USD - 94 USD

US, CA, Santa Clara

Authors to Follow