🤖 AI Summary
Chain-of-thought (CoT) reasoning improves performance on complex tasks but suffers from high latency, substantial memory overhead, and error propagation due to lengthy explicit reasoning chains. To address this, we propose R-Capsule, the first framework to apply the information bottleneck principle to reasoning compression: it compresses explicit CoT traces into a small set of latent “reasoning capsules” via a low-capacity bottleneck network, preserving high-level planning structure. We introduce a dual-objective training scheme—jointly optimizing for main-task accuracy and plan reconstruction loss—to explicitly enforce interpretability and structural fidelity in the latent space, thereby mitigating shortcut learning. Experiments across multiple complex reasoning benchmarks show that R-Capsule matches or exceeds CoT accuracy while reducing visible token count by 62% on average, significantly improving inference speed and memory efficiency. The method thus achieves a favorable trade-off among accuracy, efficiency, and transparency.
📝 Abstract
Chain-of-Thought (CoT) prompting helps Large Language Models (LLMs) tackle complex reasoning by eliciting explicit step-by-step rationales. However, CoT's verbosity increases latency and memory usage and may propagate early errors across long chains. We propose the Reasoning Capsule (R-Capsule), a framework that aims to combine the efficiency of latent reasoning with the transparency of explicit CoT. The core idea is to compress the high-level plan into a small set of learned latent tokens (a Reasoning Capsule) while keeping execution steps lightweight or explicit. This hybrid approach is inspired by the Information Bottleneck (IB) principle, where we encourage the capsule to be approximately minimal yet sufficient for the task. Minimality is encouraged via a low-capacity bottleneck, which helps improve efficiency. Sufficiency is encouraged via a dual objective: a primary task loss for answer accuracy and an auxiliary plan-reconstruction loss that encourages the capsule to faithfully represent the original textual plan. The reconstruction objective helps ground the latent space, thereby improving interpretability and reducing the use of uninformative shortcuts. Our framework strikes a balance between efficiency, accuracy, and interpretability, thereby reducing the visible token footprint of reasoning while maintaining or improving accuracy on complex benchmarks. Our codes are available at: https://anonymous.4open.science/r/Reasoning-Capsule-7BE0