Compressing Chain-of-Thought in LLMs via Step Entropy

📅 2025-08-05

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

To address the high computational cost of chain-of-thought (CoT) reasoning in large language models (LLMs) caused by redundant inference steps, this paper proposes an adaptive compression framework based on **step-wise entropy**. We first formally define and quantify the information contribution of each reasoning step, revealing that approximately 80% of low-entropy steps can be safely pruned. Building on this insight, we design a two-stage training strategy—supervised fine-tuning (SFT) followed by grouped relative policy optimization (GRPO)—to enable models to autonomously generate compact CoTs annotated with [SKIP] tokens. Extensive evaluation on multiple mathematical reasoning benchmarks demonstrates that DeepSeek-R1-7B/14B and Qwen3-8B achieve **80% compression of intermediate reasoning steps**, with only marginal accuracy degradation (≤1.2%). This yields substantial gains in inference efficiency. Our core contributions are: (i) a principled step-wise entropy model for CoT analysis, and (ii) an end-to-end learnable skip-step mechanism.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) using Chain-of-Thought (CoT) prompting excel at complex reasoning but generate verbose thought processes with considerable redundancy, leading to increased inference costs and reduced efficiency. We introduce a novel CoT compression framework based on step entropy, a metric that quantifies the informational contribution of individual reasoning steps to identify redundancy. Through theoretical analysis and extensive empirical validation on mathematical reasoning benchmarks, we demonstrate that steps with low entropy are indeed highly redundant. Our experiments reveal that an astonishing 80% of low-entropy intermediate steps can be pruned with minor degradation in the final answer accuracy across DeepSeek-R1-7B, 14B and Qwen3-8B. This finding sharply contrasts with random or high-entropy pruning, which severely impairs reasoning performance. Building on this, we propose a novel two-stage training strategy combining Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO) reinforcement learning. This approach enables LLMs to autonomously learn to generate compressed COTs during inference by strategically incorporating [SKIP] tokens. Our method significantly enhances LLM inference efficiency while rigorously preserving accuracy, offering profound implications for practical LLM deployment and a deeper understanding of reasoning structures.

Problem

Research questions and friction points this paper is trying to address.

Reduces redundancy in Chain-of-Thought reasoning steps

Identifies low-entropy steps for pruning without accuracy loss

Trains LLMs to autonomously generate compressed reasoning paths

Innovation

Methods, ideas, or system contributions that make the work stand out.

Step entropy metric identifies redundant reasoning steps

Two-stage training combines SFT and GRPO reinforcement

Autonomous [SKIP] token usage compresses CoT generation

🔎 Similar Papers

Position IDs Matter: An Enhanced Position Layout for Efficient Context Compression in Large Language Models