Efficient LLM Reasoning via Variational Posterior Guidance with Efficiency Awareness

๐Ÿ“… 2026-05-10
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

217K/year
๐Ÿค– AI Summary
This work addresses the inefficiency of large language models in complex reasoning, often caused by โ€œoverthinking,โ€ and the limitations of existing reinforcement learning approaches due to suboptimal reward design and inefficient sampling. The authors propose the VPG-EA framework, which formulates efficient reasoning as a variational inference problem. It employs a parameter-shared dual-stream architecture to learn, respectively, a posterior distribution guided by reference answers and a prior policy. Cross-view evaluation is introduced to identify pseudo-efficient reasoning paths, and variational distillation enables unidirectional transfer of efficient patterns from the posterior to the prior. Notably, this approach establishes an efficiency-aware evidence lower bound grounded in cognitive science principles. Experiments on DeepSeek-R1-Distill-Qwen-1.5B and 7B demonstrate substantial improvements, with the composite efficiency metric ฮตยณ increasing by 8.73% and 12.37% over the strongest baseline, respectively.
๐Ÿ“ Abstract
Although large language models rely on chain-of-thought for complex reasoning, the overthinking phenomenon severely degrades inference efficiency. Existing reinforcement learning methods compress reasoning chains by designing elaborate reward functions, which renders high-quality samples extremely sparse in the exploration space and creates a sampling bottleneck for the prior policy. Inspired by cognitive science, we theoretically prove that a posterior distribution guided by reference answers achieves higher expected utility than the prior distribution, thus capable of breaking through the sampling bottleneck of high-quality samples. However, the posterior distribution is unavailable during inference. To this end, we formalize efficient reasoning as a variational inference problem and introduce an efficiency-aware evidence lower bound as the theoretical foundation. Based on this, we propose the VPG-EA framework. It adopts a parameter-shared dual-stream architecture to instantiate both the posterior distribution and the prior policy; after filtering out pseudo-efficient paths via cross-view evaluation, it unidirectionally transfers the posterior's efficient patterns to the prior policy through variational distillation. Experiments on DeepSeek-R1-Distill-Qwen-1.5B and 7B scales demonstrate that VPG-EA improves the comprehensive efficiency metric epsilon cubed by 8.73% and 12.37% over the strongest baselines on each model size, respectively.
Problem

Research questions and friction points this paper is trying to address.

overthinking
reasoning efficiency
sampling bottleneck
chain-of-thought
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

variational inference
posterior guidance
efficiency-aware reasoning
chain-of-thought compression
variational distillation