CtrlCoT: Dual-Granularity Chain-of-Thought Compression for Controllable Reasoning

📅 2026-01-28

📈 Citations: 0

✨ Influential: 0

career value

156K/year

🤖 AI Summary

This work addresses the inefficiency of Chain-of-Thought (CoT) reasoning in large language models, where verbose reasoning trajectories incur high latency and memory overhead, and existing compression methods struggle to balance semantic fidelity with computational efficiency. To this end, we propose CtrlCoT, a dual-granularity CoT compression framework that integrates multi-level semantic abstraction with logic-aware token pruning. By leveraging logic-preserving distillation and aligning the distribution of reasoning styles, CtrlCoT effectively compresses reasoning traces while maintaining—or even enhancing—reasoning accuracy. Evaluated on the MATH-500 benchmark, CtrlCoT reduces inference tokens by 30.7% compared to the strongest baseline while improving accuracy by 7.6 percentage points, demonstrating a significant advance in achieving both efficiency and reliability in compressed CoT reasoning.

Technology Category

Application Category

📝 Abstract

Chain-of-thought (CoT) prompting improves LLM reasoning but incurs high latency and memory cost due to verbose traces, motivating CoT compression with preserved correctness. Existing methods either shorten CoTs at the semantic level, which is often conservative, or prune tokens aggressively, which can miss task-critical cues and degrade accuracy. Moreover, combining the two is non-trivial due to sequential dependency, task-agnostic pruning, and distribution mismatch. We propose \textbf{CtrlCoT}, a dual-granularity CoT compression framework that harmonizes semantic abstraction and token-level pruning through three components: Hierarchical Reasoning Abstraction produces CoTs at multiple semantic granularities; Logic-Preserving Distillation trains a logic-aware pruner to retain indispensable reasoning cues (e.g., numbers and operators) across pruning ratios; and Distribution-Alignment Generation aligns compressed traces with fluent inference-time reasoning styles to avoid fragmentation. On MATH-500 with Qwen2.5-7B-Instruct, CtrlCoT uses 30.7\% fewer tokens while achieving 7.6 percentage points higher than the strongest baseline, demonstrating more efficient and reliable reasoning. Our code will be publicly available at https://github.com/fanzhenxuan/Ctrl-CoT.

Problem

Research questions and friction points this paper is trying to address.

Chain-of-Thought Compression

Reasoning Efficiency

Latency Reduction

Correctness Preservation

Token Pruning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-Thought Compression

Dual-Granularity Reasoning

Logic-Preserving Pruning