Towards Reliable Truth-Aligned Uncertainty Estimation in Large Language Models

📅 2026-03-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing uncertainty estimation metrics for large language models exhibit unstable performance across different configurations and struggle to reliably detect hallucinations, primarily because they are not explicitly aligned with factual correctness. This work formally introduces the problem of “surrogate failure” and proposes Truth AnChoring (TAC), a novel approach that post-hoc calibrates raw uncertainty scores into fact-aligned reliability measures. By integrating few-shot supervision with score remapping, TAC effectively learns stable and discriminative uncertainty estimates even in low-information settings. Experimental results demonstrate that TAC significantly enhances the reliability and robustness of uncertainty estimation, effectively mitigating the degradation commonly observed in traditional methods under data-scarce or noisy conditions.
📝 Abstract
Uncertainty estimation (UE) aims to detect hallucinated outputs of large language models (LLMs) to improve their reliability. However, UE metrics often exhibit unstable performance across configurations, which significantly limits their applicability. In this work, we formalise this phenomenon as proxy failure, since most UE metrics originate from model behaviour, rather than being explicitly grounded in the factual correctness of LLM outputs. With this, we show that UE metrics become non-discriminative precisely in low-information regimes. To alleviate this, we propose Truth AnChoring (TAC), a post-hoc calibration method to remedy UE metrics, by mapping the raw scores to truth-aligned scores. Even with noisy and few-shot supervision, our TAC can support the learning of well-calibrated uncertainty estimates, and presents a practical calibration protocol. Our findings highlight the limitations of treating heuristic UE metrics as direct indicators of truth uncertainty, and position our TAC as a necessary step toward more reliable uncertainty estimation for LLMs. The code repository is available at https://github.com/ponhvoan/TruthAnchor/.
Problem

Research questions and friction points this paper is trying to address.

uncertainty estimation
large language models
hallucination detection
truth alignment
proxy failure
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uncertainty Estimation
Truth Alignment
Proxy Failure
Post-hoc Calibration
Large Language Models
🔎 Similar Papers
No similar papers found.
P
Ponhvoan Srey
Nanyang Technological University, Singapore
Q
Quang Minh Nguyen
KAIST, South Korea
Xiaobao Wu
Xiaobao Wu
Research Scientist, Nanyang Technological University
Large Language ModelsMachine LearningNatural Language Processing
A
Anh Tuan Luu
Nanyang Technological University, Singapore