UNCERTAINTY-LINE: Length-Invariant Estimation of Uncertainty for Large Language Models

📅 2025-05-25

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Existing uncertainty quantification (UQ) methods rely on token-level probabilities, inducing output-length bias—even after length normalization, residual bias persists. This work is the first to systematically identify and characterize this fundamental issue. We propose UNCERTAINTY-LINE, a model-agnostic, post-hoc debiasing framework: it regresses raw UQ scores (e.g., entropy, confidence, MC Dropout variance) against output length as a covariate and uses the regression residuals as length-invariant reliability estimates. UNCERTAINTY-LINE is compatible with diverse UQ metrics and large language models (LLMs). Empirically, it substantially improves calibration (↓ Expected Calibration Error), discrimination (↑ Area Under ROC Curve), and ranking consistency (↑ Spearman correlation) across machine translation, summarization, and question answering—outperforming state-of-the-art UQ baselines uniformly.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have become indispensable tools across various applications, making it more important than ever to ensure the quality and the trustworthiness of their outputs. This has led to growing interest in uncertainty quantification (UQ) methods for assessing the reliability of LLM outputs. Many existing UQ techniques rely on token probabilities, which inadvertently introduces a bias with respect to the length of the output. While some methods attempt to account for this, we demonstrate that such biases persist even in length-normalized approaches. To address the problem, here we propose UNCERTAINTY-LINE: (Length-INvariant Estimation), a simple debiasing procedure that regresses uncertainty scores on output length and uses the residuals as corrected, length-invariant estimates. Our method is post-hoc, model-agnostic, and applicable to a range of UQ measures. Through extensive evaluation on machine translation, summarization, and question-answering tasks, we demonstrate that UNCERTAINTY-LINE: consistently improves over even nominally length-normalized UQ methods uncertainty estimates across multiple metrics and models.

Problem

Research questions and friction points this paper is trying to address.

Addressing bias in uncertainty quantification due to output length

Proposing a length-invariant method for reliable uncertainty estimation

Improving uncertainty measures across diverse tasks and models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Length-invariant uncertainty estimation method

Post-hoc debiasing using regression residuals

Model-agnostic for various UQ measures

🔎 Similar Papers

Unlocking the Power of LLM Uncertainty for Active In-Context Example Selection

2024-08-17Citations: 1

💼 Related Jobs

Machine Learning Engineer - Agentic AI

Apple

Sunnyvale, United States of America

Authors to Follow