A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning

📅 2025-10-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of theoretical foundations for test-time scaling methods in large language model (LLM) inference. We propose the first theoretical analysis framework for LLM reasoning, grounded in confidence estimation, which formally characterizes the error sources of self-consistency and perplexity-based approaches. Building on this analysis, we introduce Perplexity-Consistency (PC), a novel mechanism that synergistically integrates the strengths of both paradigms, augmented with inference-path pruning to suppress low-probability distracting tokens. We provide rigorous theoretical guarantees showing that PC achieves exponential improvement in error convergence rate. Extensive evaluation across seven benchmarks demonstrates that PC matches self-consistency in reasoning accuracy, achieves superior confidence calibration, reduces sampling cost by 50%, and significantly enhances both inference efficiency and reliability.

Technology Category

Application Category

📝 Abstract
Test-time scaling seeks to improve the reasoning performance of large language models (LLMs) by adding computational resources. A prevalent approach within the field is sampling-based test-time scaling methods, which enhance reasoning by generating multiple reasoning paths for a given input during inference. However, despite its practical success, the theoretical foundations remain underexplored. In this paper, we provide the first theoretical framework for analyzing sampling-based test-time scaling methods, grounded in the perspective of confidence estimation. Based on the framework, we analyze two dominant paradigms: self-consistency and perplexity, and reveal key limitations: self-consistency suffers from high estimation error while perplexity exhibits substantial modeling error and possible degradation of the estimation error convergence. To address these limitations, we introduce RPC, a hybrid method that leverages our theoretical insights through two key components: Perplexity Consistency and Reasoning Pruning. Perplexity Consistency combines the strengths of self-consistency and perplexity, boosting the convergence rate of estimation error from linear to exponential while preserving model error. Reasoning Pruning prevents degradation by eliminating low-probability reasoning paths. Both theoretical analysis and empirical results across seven benchmark datasets demonstrate that RPC has a strong potential for reducing reasoning error. Notably, RPC achieves reasoning performance comparable to self-consistency while not only enhancing confidence reliability but also reducing sampling costs by 50%. The code and resources are available at https://wnjxyk.github.io/RPC.
Problem

Research questions and friction points this paper is trying to address.

Developing theoretical framework for sampling-based test-time scaling methods
Addressing limitations of self-consistency and perplexity in LLM reasoning
Proposing hybrid method to reduce estimation error and sampling costs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid method combining perplexity and self-consistency
Prunes low-probability reasoning paths to prevent degradation
Boosts estimation error convergence rate exponentially
🔎 Similar Papers
No similar papers found.
Z
Zhi Zhou
State Key Laboratory of Novel Software Technology, Nanjing University, China
Y
Yuhao Tan
State Key Laboratory of Novel Software Technology, Nanjing University, China
Z
Zenan Li
Department of Computer Science, ETH Zurich, Switzerland
Y
Yuan Yao
State Key Laboratory of Novel Software Technology, Nanjing University, China
Lan-Zhe Guo
Lan-Zhe Guo
LAMDA Group, Nanjing University
Machine Learning
Yu-Feng Li
Yu-Feng Li
Professor, Nanjing University
Machine Learning
Xiaoxing Ma
Xiaoxing Ma
Professor of Computer Science and Technology, Nanjing University
software engineeringself-adaptive systemsreliability of machine learning