Why Uncertainty Estimation Methods Fall Short in RAG: An Axiomatic Analysis

📅 2025-05-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing uncertainty estimation (UE) methods in retrieval-augmented generation (RAG) fail to reliably reflect the correctness of large language model (LLM) responses, hindering trustworthy RAG deployment. To address this, we propose— for the first time—the RAG-specific Axiomatic Uncertainty Framework, comprising five formal axioms that expose structural deficiencies in prevailing UE approaches. Leveraging this framework, we design a novel uncertainty calibration function that satisfies more axioms than existing methods. Extensive experiments across multiple datasets, LLMs, and retrievers demonstrate that no mainstream UE method satisfies all five axioms; our calibrated approach significantly improves the correlation between uncertainty scores and response correctness—achieving an average Spearman rank correlation increase of 0.28. This work establishes the first verifiable theoretical foundation and practical solution for assessing RAG reliability.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) are valued for their strong performance across various tasks, but they also produce inaccurate or misleading outputs. Uncertainty Estimation (UE) quantifies the model's confidence and helps users assess response reliability. However, existing UE methods have not been thoroughly examined in scenarios like Retrieval-Augmented Generation (RAG), where the input prompt includes non-parametric knowledge. This paper shows that current UE methods cannot reliably assess correctness in the RAG setting. We further propose an axiomatic framework to identify deficiencies in existing methods and guide the development of improved approaches. Our framework introduces five constraints that an effective UE method should meet after incorporating retrieved documents into the LLM's prompt. Experimental results reveal that no existing UE method fully satisfies all the axioms, explaining their suboptimal performance in RAG. We further introduce a simple yet effective calibration function based on our framework, which not only satisfies more axioms than baseline methods but also improves the correlation between uncertainty estimates and correctness.
Problem

Research questions and friction points this paper is trying to address.

Existing UE methods fail in RAG with non-parametric knowledge prompts
Current UE methods cannot reliably assess correctness in RAG
No UE method fully meets axioms for effective uncertainty estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes axiomatic framework for UE in RAG
Introduces five constraints for effective UE
Develops calibration function improving uncertainty correlation
🔎 Similar Papers
No similar papers found.