Token-Level Uncertainty Estimation for Large Language Model Reasoning

📅 2025-05-16

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Large language models (LLMs) exhibit unreliable outputs and lack calibrated uncertainty estimates in complex mathematical reasoning. Method: This paper introduces the first token-level uncertainty estimation framework for LLMs, enabling self-assessment and self-refinement. It generates token-wise predictive distributions via low-rank stochastic weight perturbation, designs a semantic-aware uncertainty aggregation mechanism, and—novelly—incorporates particle filtering into uncertainty-guided multi-step reasoning enhancement. Contribution/Results: Experiments across multiple mathematical reasoning benchmarks demonstrate that the estimated token-level uncertainty strongly correlates with answer correctness, significantly outperforming existing uncertainty modeling and reasoning augmentation methods. The framework improves both reasoning accuracy and robustness, establishing a new state of the art in uncertainty-aware mathematical reasoning for LLMs.

Technology Category

Application Category

📝 Abstract

While Large Language Models (LLMs) have demonstrated impressive capabilities, their output quality remains inconsistent across various application scenarios, making it difficult to identify trustworthy responses, especially in complex tasks requiring multi-step reasoning. In this paper, we propose a token-level uncertainty estimation framework to enable LLMs to self-assess and self-improve their generation quality in mathematical reasoning. Specifically, we introduce low-rank random weight perturbation to LLM decoding, generating predictive distributions that we use to estimate token-level uncertainties. We then aggregate these uncertainties to reflect semantic uncertainty of the generated sequences. Experiments on mathematical reasoning datasets of varying difficulty demonstrate that our token-level uncertainty metrics strongly correlate with answer correctness and model robustness. Additionally, we explore using uncertainty to directly enhance the model's reasoning performance through multiple generations and the particle filtering algorithm. Our approach consistently outperforms existing uncertainty estimation methods, establishing effective uncertainty estimation as a valuable tool for both evaluating and improving reasoning generation in LLMs.

Problem

Research questions and friction points this paper is trying to address.

Estimating token-level uncertainty in LLM reasoning outputs

Improving LLM self-assessment for trustworthy mathematical reasoning

Enhancing model robustness via uncertainty-based performance optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Token-level uncertainty estimation for LLM reasoning

Low-rank random weight perturbation in decoding

Particle filtering algorithm for enhanced performance

🔎 Similar Papers

Reasoning over Uncertain Text by Generative Large Language Models