Towards Efficient Large Language Reasoning Models via Extreme-Ratio Chain-of-Thought Compression

📅 2026-02-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high computational cost of Chain-of-Thought (CoT) reasoning in large language models, which existing compression methods struggle to mitigate without compromising logical fidelity under aggressive compression ratios. To overcome this challenge, the authors propose Extra-CoT, a novel framework that employs a semantics-preserving compressor to generate high-fidelity supervision data, combined with mixed-ratio supervised fine-tuning and a new Constrained Hierarchical Ratio Policy Optimization (CHRPO) algorithm. This approach enables highly efficient and accurate reasoning even at extreme compression levels. Evaluated on three mathematical reasoning benchmarks—including MATH-500—Extra-CoT significantly outperforms state-of-the-art methods, achieving over 73% token compression on Qwen3-1.7B while simultaneously improving accuracy by 0.6%.

Technology Category

Application Category

📝 Abstract
Chain-of-Thought (CoT) reasoning successfully enhances the reasoning capabilities of Large Language Models (LLMs), yet it incurs substantial computational overhead for inference. Existing CoT compression methods often suffer from a critical loss of logical fidelity at high compression ratios, resulting in significant performance degradation. To achieve high-fidelity, fast reasoning, we propose a novel EXTreme-RAtio Chain-of-Thought Compression framework, termed Extra-CoT, which aggressively reduces the token budget while preserving answer accuracy. To generate reliable, high-fidelity supervision, we first train a dedicated semantically-preserved compressor on mathematical CoT data with fine-grained annotations. An LLM is then fine-tuned on these compressed pairs via a mixed-ratio supervised fine-tuning (SFT), teaching it to follow a spectrum of compression budgets and providing a stable initialization for reinforcement learning (RL). We further propose Constrained and Hierarchical Ratio Policy Optimization (CHRPO) to explicitly incentivize question-solving ability under lower budgets by a hierarchical reward. Experiments on three mathematical reasoning benchmarks show the superiority of Extra-CoT. For example, on MATH-500 using Qwen3-1.7B, Extra-CoT achieves over 73\% token reduction with an accuracy improvement of 0.6\%, significantly outperforming state-of-the-art (SOTA) methods.
Problem

Research questions and friction points this paper is trying to address.

Chain-of-Thought
compression
reasoning
large language models
computational overhead
Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-Thought Compression
Extreme-Ratio Compression
Constrained and Hierarchical Ratio Policy Optimization
Supervised Fine-Tuning
Token Efficiency
🔎 Similar Papers
No similar papers found.
Y
Yuntian Tang
East China Normal University, China
Bohan Jia
Bohan Jia
East China Normal University
MLLMLLMAIGC
Wenxuan Huang
Wenxuan Huang
CUHK & ECNU
Artificial General IntelligenceMLLMLLMAIGCModel Acceleration
L
Lianyue Zhang
East China Normal University, China
J
Jiao Xie
East China Normal University, China
W
Wenxi Li
East China Normal University, China
Wei Li
Wei Li
Huawei Noah‘s Ark Lab
Low-level VisionComputer VisionAIGC
Jie Hu
Jie Hu
HUAWEI,USTC
Computer visionlow level visionAIGC
Xinghao Chen
Xinghao Chen
Noah's Ark Lab, Huawei
Computer VisionMachine LearningDeep Learning
R
Rongrong Ji
Key Laboratory of Multimedia Trusted Perception and Efficient Computing of Ministry of Education of China, Xiamen University, China
S
Shaohui Lin
East China Normal University, China; Key Laboratory of Multimedia Trusted Perception and Efficient Computing of Ministry of Education of China, Xiamen University, China