UNCERTAINTY-LINE: Length-Invariant Estimation of Uncertainty for Large Language Models

📅 2025-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing uncertainty quantification (UQ) methods rely on token-level probabilities, inducing output-length bias—even after length normalization, residual bias persists. This work is the first to systematically identify and characterize this fundamental issue. We propose UNCERTAINTY-LINE, a model-agnostic, post-hoc debiasing framework: it regresses raw UQ scores (e.g., entropy, confidence, MC Dropout variance) against output length as a covariate and uses the regression residuals as length-invariant reliability estimates. UNCERTAINTY-LINE is compatible with diverse UQ metrics and large language models (LLMs). Empirically, it substantially improves calibration (↓ Expected Calibration Error), discrimination (↑ Area Under ROC Curve), and ranking consistency (↑ Spearman correlation) across machine translation, summarization, and question answering—outperforming state-of-the-art UQ baselines uniformly.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have become indispensable tools across various applications, making it more important than ever to ensure the quality and the trustworthiness of their outputs. This has led to growing interest in uncertainty quantification (UQ) methods for assessing the reliability of LLM outputs. Many existing UQ techniques rely on token probabilities, which inadvertently introduces a bias with respect to the length of the output. While some methods attempt to account for this, we demonstrate that such biases persist even in length-normalized approaches. To address the problem, here we propose UNCERTAINTY-LINE: (Length-INvariant Estimation), a simple debiasing procedure that regresses uncertainty scores on output length and uses the residuals as corrected, length-invariant estimates. Our method is post-hoc, model-agnostic, and applicable to a range of UQ measures. Through extensive evaluation on machine translation, summarization, and question-answering tasks, we demonstrate that UNCERTAINTY-LINE: consistently improves over even nominally length-normalized UQ methods uncertainty estimates across multiple metrics and models.
Problem

Research questions and friction points this paper is trying to address.

Addressing bias in uncertainty quantification due to output length
Proposing a length-invariant method for reliable uncertainty estimation
Improving uncertainty measures across diverse tasks and models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Length-invariant uncertainty estimation method
Post-hoc debiasing using regression residuals
Model-agnostic for various UQ measures