Revisiting LLM Reasoning via Information Bottleneck

📅 2025-07-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing reinforcement learning (RL)-based LLM reasoning methods rely on heuristic reward design, lacking rigorous theoretical foundations. Method: Grounded in the Information Bottleneck (IB) principle, we propose IBRO—the first framework to systematically integrate IB theory into LLM reasoning optimization. IBRO introduces a token-level surrogate objective that explicitly models the trade-off between information compression and generalization along reasoning paths, and incorporates a lightweight, plug-and-play IB regularization term deployable with a single line of code into standard RL training pipelines. Results: IBRO consistently improves accuracy across mathematical reasoning (GSM8K, MATH) and multi-step decision-making tasks (+3.2–5.7%), while demonstrating strong generalization and computational efficiency. Its core contribution is establishing the first information-theoretic, interpretable framework for LLM reasoning—bridging theory and practice via a theoretically grounded, implementation-ready solution.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have recently demonstrated remarkable progress in reasoning capabilities through reinforcement learning with verifiable rewards (RLVR). By leveraging simple rule-based rewards, RL effectively incentivizes LLMs to produce extended chain-of-thought (CoT) reasoning trajectories, progressively guiding them toward correct answers. However, existing approaches remain largely heuristic and intuition-driven, limiting the development of principled methodologies. In this paper, we present a theoretical characterization of LLM reasoning grounded in information bottleneck (IB) principle, introducing IB-aware reasoning optimization (IBRO), a framework that encourages reasoning trajectories to be both informative about the final correct answer and generalizable across diverse prompts. We derive a practical token-level surrogate objective and propose an efficient approximation, resulting in the lightweight IB regularization method. This technique integrates seamlessly into existing RL-based post-training frameworks without additional computational overhead, requiring only a one-line code modification. Empirically, we validate IB regularization across multiple mathematical reasoning benchmarks and RL algorithms, demonstrating consistent improvements in LLM reasoning performance.
Problem

Research questions and friction points this paper is trying to address.

Theoretical characterization of LLM reasoning via information bottleneck principle
Optimizing reasoning trajectories for informativeness and generalizability
Lightweight regularization for improved LLM reasoning performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages information bottleneck principle for reasoning
Introduces lightweight IB regularization method
Seamlessly integrates into RL-based frameworks
🔎 Similar Papers
No similar papers found.