R-Capsule: Compressing High-Level Plans for Efficient Large Language Model Reasoning

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Chain-of-thought (CoT) reasoning improves performance on complex tasks but suffers from high latency, substantial memory overhead, and error propagation due to lengthy explicit reasoning chains. To address this, we propose R-Capsule, the first framework to apply the information bottleneck principle to reasoning compression: it compresses explicit CoT traces into a small set of latent “reasoning capsules” via a low-capacity bottleneck network, preserving high-level planning structure. We introduce a dual-objective training scheme—jointly optimizing for main-task accuracy and plan reconstruction loss—to explicitly enforce interpretability and structural fidelity in the latent space, thereby mitigating shortcut learning. Experiments across multiple complex reasoning benchmarks show that R-Capsule matches or exceeds CoT accuracy while reducing visible token count by 62% on average, significantly improving inference speed and memory efficiency. The method thus achieves a favorable trade-off among accuracy, efficiency, and transparency.

Technology Category

Application Category

📝 Abstract

Chain-of-Thought (CoT) prompting helps Large Language Models (LLMs) tackle complex reasoning by eliciting explicit step-by-step rationales. However, CoT's verbosity increases latency and memory usage and may propagate early errors across long chains. We propose the Reasoning Capsule (R-Capsule), a framework that aims to combine the efficiency of latent reasoning with the transparency of explicit CoT. The core idea is to compress the high-level plan into a small set of learned latent tokens (a Reasoning Capsule) while keeping execution steps lightweight or explicit. This hybrid approach is inspired by the Information Bottleneck (IB) principle, where we encourage the capsule to be approximately minimal yet sufficient for the task. Minimality is encouraged via a low-capacity bottleneck, which helps improve efficiency. Sufficiency is encouraged via a dual objective: a primary task loss for answer accuracy and an auxiliary plan-reconstruction loss that encourages the capsule to faithfully represent the original textual plan. The reconstruction objective helps ground the latent space, thereby improving interpretability and reducing the use of uninformative shortcuts. Our framework strikes a balance between efficiency, accuracy, and interpretability, thereby reducing the visible token footprint of reasoning while maintaining or improving accuracy on complex benchmarks. Our codes are available at: https://anonymous.4open.science/r/Reasoning-Capsule-7BE0

Problem

Research questions and friction points this paper is trying to address.

Compressing verbose reasoning chains to reduce latency

Maintaining accuracy while minimizing token footprint

Balancing efficiency with interpretability in LLM reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Compresses high-level plans into learned latent tokens

Uses Information Bottleneck principle for minimal sufficient representation

Combines efficiency of latent reasoning with CoT transparency

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting

2024-10-10Citations: 0

Authors to Follow