Improving reasoning at inference time via uncertainty minimisation

📅 2026-03-07

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This work addresses the high computational cost of existing inference-time scaling methods for large language models, which often rely on extensive sampling or external verifiers and struggle to efficiently enhance multi-step reasoning. The authors propose a lightweight inference mechanism that frames reasoning as an uncertainty minimization problem. By leveraging the model’s internal prediction distributions at the thought level—rather than the token level—the method computes confidence scores and employs a greedy strategy to select the most certain subsequent reasoning path. Requiring no external feedback and operating effectively with minimal sampling, the approach significantly improves performance, particularly on open-ended problems. Experiments show it outperforms greedy decoding and matches or exceeds self-consistency on MATH500 and GSM8K. Cross-lingual evaluations further demonstrate its generalizability, and early-stage decisions are shown to reliably predict final answer accuracy.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) now exhibit strong multi-step reasoning abilities, but existing inference-time scaling methods remain computationally expensive, often relying on extensive sampling or external evaluators. We propose a principled strategy that frames reasoning as uncertainty minimisation and operates at the level of individual thoughts rather than tokens. Our method selects, at each reasoning step, the continuation that maximizes the model's self-certainty, a metric computed from its internal predictive distribution. This approach achieves significant improvement with a small number of samples, relies exclusively on model-internal signals, and applies to open-ended questions as opposed to methods like majority voting. Experiments on MATH500 and GSM8K across multiple model sizes demonstrate that thought-level self-certainty maximization consistently outperforms greedy decoding and matches or exceeds self-consistency under comparable token budgets. Cross-linguistic evaluations further indicate that the method transfers robustly beyond high-resource languages. Furthermore, analysis of self-certainty dynamics reveals that correct reasoning trajectories converge early to stable paths, suggesting that early decisions, likely associated with the planning of the reasoning process, are predictive of final accuracy. Building on this result, we show that self-certainty maximisation applied to the early steps can explain most of the performance gain and provide a simple yet efficient inference-time scaling method.

Problem

Research questions and friction points this paper is trying to address.

reasoning

inference-time scaling

uncertainty minimisation

large language models

computational efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

uncertainty minimisation

self-certainty

thought-level reasoning