ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning

📅 2025-05-08

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Large reasoning models (LRMs) frequently generate redundant reasoning steps in chain-of-thought (CoT) inference, leading to high computational overhead and degraded user experience; existing compression methods—largely post-hoc or sampling-based—struggle to preserve reasoning coherence. This work identifies two root causes of redundancy: insufficient step-wise confidence and delayed termination. We propose the first confidence-guided real-time compression framework that jointly suppresses redundant reflection at the generation source. Our approach integrates intermediate-layer confidence modeling and injection, dynamic early stopping, and SimPO-based alignment fine-tuning. Experiments across multiple reasoning benchmarks show that our method reduces average CoT length by ~50% while maintaining near-original accuracy—significantly outperforming prior compression techniques.

Technology Category

Application Category

📝 Abstract

Large Reasoning Models (LRMs) perform strongly in complex reasoning tasks via Chain-of-Thought (CoT) prompting, but often suffer from verbose outputs caused by redundant content, increasing computational overhead, and degrading user experience. Existing compression methods either operate post-hoc pruning, risking disruption to reasoning coherence, or rely on sampling-based selection, which fails to intervene effectively during generation. In this work, we introduce a confidence-guided perspective to explain the emergence of redundant reflection in LRMs, identifying two key patterns: Confidence Deficit, where the model reconsiders correct steps due to low internal confidence, and Termination Delay, where reasoning continues even after reaching a confident answer. Based on this analysis, we propose ConCISE (Confidence-guided Compression In Step-by-step Efficient Reasoning), a framework that simplifies reasoning chains by reinforcing the model's confidence during inference, thus preventing the generation of redundant reflection steps. It integrates Confidence Injection to stabilize intermediate steps and Early Stopping to terminate reasoning when confidence is sufficient. Extensive experiments demonstrate that fine-tuning LRMs on ConCISE-generated data yields significantly shorter outputs, reducing length by up to approximately 50% under SimPO, while maintaining high task accuracy. ConCISE consistently outperforms existing baselines across multiple reasoning benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Redundant outputs in Large Reasoning Models increase computational overhead.

Existing compression methods disrupt reasoning coherence or fail during generation.

ConCISE reduces output length by 50% while maintaining high accuracy.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Confidence-guided compression prevents redundant reasoning steps

Integrates Confidence Injection to stabilize intermediate steps

Employs Early Stopping when confidence is sufficient

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting