🤖 AI Summary
This work addresses the fundamental trade-off between sampling budget and reasoning quality in large language model inference, where existing approaches suffer from inefficiency due to decoupled treatment of search width and depth. The authors propose the Dual-Dimensional Consistency (DDC) framework, which unifies quality assessment of reasoning paths and adaptive termination within a single principled model. DDC employs a confidence-weighted Bayesian protocol to reliably aggregate width-wise consensus and integrates trend-aware hierarchical pruning to dynamically prioritize high-potential paths, thereby mitigating both hallucination amplification and premature truncation. Experimental results across five benchmarks demonstrate that DDC reduces token consumption by over an order of magnitude while maintaining or even improving accuracy.
📝 Abstract
Large Language Models (LLMs) have demonstrated remarkable abilities in reasoning. However, maximizing their potential through inference-time scaling faces challenges in trade-off between sampling budget and reasoning quality. Current strategies remain inefficient as they typically treat sampling width and depth as orthogonal objectives, where width consensus methods risk reinforcing hallucinations, while depth pruning mechanisms prematurely truncate complex yet valid reasoning chains. Therefore, we propose Dual-Dimensional Consistency (DDC), a unified framework that bridges path quality with adaptive termination. By coupling Confidence-Weighted Bayesian protocol with a Trend-Aware Stratified Pruning, our method ensures that computational resources are concentrated on high quality reasoning paths, filtering hallucinations while accelerating consensus. Evaluations across five benchmarks demonstrate that this approach reduces token consumption by over 10 times while maintaining or exceeding the accuracy of strong baselines across various LLMs.