Uncertainty as Feature Gaps: Epistemic Uncertainty Quantification of LLMs in Contextual Question-Answering

📅 2025-10-02

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

Existing uncertainty quantification (UQ) research focuses predominantly on closed-book factual question answering, neglecting the more realistic and challenging contextual question answering (Contextual QA) setting. Method: This paper presents the first systematic investigation of epistemic uncertainty quantification for large language models (LLMs) in Contextual QA. We propose a semantic-feature-gap-based theoretical framework that formalizes uncertainty as the semantic distance between the model’s internal representation and that of an ideal prompt-aware reference model, disentangling it into three interpretable dimensions: context dependency, comprehension depth, and honesty. Our approach integrates cross-entropy decomposition, ideal-model approximation, and top-down semantic feature extraction—enabling efficient, sampling-free, low-overhead uncertainty estimation with minimal supervision. Results: Extensive evaluations across multiple in-distribution and out-of-distribution benchmarks demonstrate consistent superiority over state-of-the-art unsupervised and supervised UQ methods, achieving up to a 13.0-point improvement in Prediction Reliability Ratio (PRR).

Technology Category

Application Category

📝 Abstract

Uncertainty Quantification (UQ) research has primarily focused on closed-book factual question answering (QA), while contextual QA remains unexplored, despite its importance in real-world applications. In this work, we focus on UQ for the contextual QA task and propose a theoretically grounded approach to quantify epistemic uncertainty. We begin by introducing a task-agnostic, token-level uncertainty measure defined as the cross-entropy between the predictive distribution of the given model and the unknown true distribution. By decomposing this measure, we isolate the epistemic component and approximate the true distribution by a perfectly prompted, idealized model. We then derive an upper bound for epistemic uncertainty and show that it can be interpreted as semantic feature gaps in the given model's hidden representations relative to the ideal model. We further apply this generic framework to the contextual QA task and hypothesize that three features approximate this gap: context-reliance (using the provided context rather than parametric knowledge), context comprehension (extracting relevant information from context), and honesty (avoiding intentional lies). Using a top-down interpretability approach, we extract these features by using only a small number of labeled samples and ensemble them to form a robust uncertainty score. Experiments on multiple QA benchmarks in both in-distribution and out-of-distribution settings show that our method substantially outperforms state-of-the-art unsupervised (sampling-free and sampling-based) and supervised UQ methods, achieving up to a 13-point PRR improvement while incurring a negligible inference overhead.

Problem

Research questions and friction points this paper is trying to address.

Quantifying epistemic uncertainty in contextual question answering tasks

Identifying semantic feature gaps between model and ideal performance

Developing unsupervised uncertainty measures using context reliance and comprehension

Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantifies epistemic uncertainty via semantic feature gaps

Extracts context-reliance, comprehension, and honesty features

Uses top-down interpretability with minimal labeled samples

🔎 Similar Papers

MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty