Entropy-Guided Reasoning Compression

📅 2025-11-18

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Large reasoning models suffer from excessive computational overhead and deployment challenges due to overly long chain-of-thought (CoT) reasoning. Existing compression methods overlook an “entropy conflict” during training: while compression objectives suppress high-entropy tokens—primarily redundant logical connectives—to reduce entropy and shorten reasoning chains, accuracy objectives encourage their generation to enhance reasoning exploration, creating an optimization dilemma. This work is the first to formally model and expose the gradient dynamics and linguistic origin of this conflict—identifying high-entropy tokens as predominantly redundant logical connectives. We propose an entropy-guided training framework that dynamically regulates reasoning entropy to balance conciseness and exploratory capability. Evaluated on six mathematical reasoning benchmarks, our method compresses reasoning length to 20% of the original while maintaining or surpassing baseline accuracy—significantly breaking the traditional performance-efficiency trade-off barrier in CoT compression.

Technology Category

Application Category

📝 Abstract

Large reasoning models have demonstrated remarkable performance on complex reasoning tasks, yet the excessive length of their chain-of-thought outputs remains a major practical bottleneck due to high computation cost and poor deployability. Existing compression methods have achieved partial success but overlook a crucial phenomenon in the training process -- the entropy conflict. During compression training, entropy decreases, leading to shorter reasoning but limited exploration, while accuracy-oriented objectives increase entropy, lengthening reasoning chains. This can cause the model to get stuck in a local dilemma. Our analysis further reveals the origin of the entropy conflict: many high-entropy tokens are logical connectors that receive larger gradients and are encouraged under the performance objective, while the compression objective simultaneously penalizes these potentially redundant connectors. This opposing pressure creates a direct source of entropy conflict. To address these issues, we adopt an entropy-guided training framework. As entropy descends, the model is guided toward efficient reasoning by encouraging concise thought steps; as entropy rises, exploration is reinforced under the compact reasoning mode to improve robustness. Experiments on six mathematical benchmarks show that our method compresses reasoning length to 20% of the original while maintaining or even surpassing baseline accuracy. Code and models will be released publicly.

Problem

Research questions and friction points this paper is trying to address.

Compressing lengthy chain-of-thought outputs from reasoning models

Resolving entropy conflict between compression and accuracy objectives

Maintaining reasoning accuracy while reducing computation costs significantly

Innovation

Methods, ideas, or system contributions that make the work stand out.

Entropy-guided training framework for compression

Balances reasoning length reduction with accuracy

Maintains performance while compressing reasoning chains

🔎 Similar Papers

No similar papers found.