Uncertainty Quantification for Retrieval-Augmented Reasoning

📅 2025-10-13

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

Retrieval-augmented reasoning (RAR) involves multi-step, coupled retrieval and generation processes, making its uncertainty difficult to quantify accurately; existing uncertainty quantification (UQ) methods are largely designed for single-step settings and fail to model the joint uncertainty across retrieval and generation components. Method: We propose R²C, the first UQ framework for RAR that dynamically perturbs reasoning steps to induce interactive input variations across retrieval and generation modules, and integrates consistency analysis with iterative feedback to comprehensively characterize end-to-end uncertainty. The method enables confidence-guided abstention and model selection. Results: Evaluated on five RAR systems and multi-source question-answering benchmarks, R²C achieves an average AUROC improvement of over 5%. In external analysis tasks, it improves abstention F1 and accuracy (AccAbstain) by approximately 5%, and model selection accuracy by 7%.

Technology Category

Application Category

📝 Abstract

Retrieval-augmented reasoning (RAR) is a recent evolution of retrieval-augmented generation (RAG) that employs multiple reasoning steps for retrieval and generation. While effective for some complex queries, RAR remains vulnerable to errors and misleading outputs. Uncertainty quantification (UQ) offers methods to estimate the confidence of systems' outputs. These methods, however, often handle simple queries with no retrieval or single-step retrieval, without properly handling RAR setup. Accurate estimation of UQ for RAR requires accounting for all sources of uncertainty, including those arising from retrieval and generation. In this paper, we account for all these sources and introduce Retrieval-Augmented Reasoning Consistency (R2C)--a novel UQ method for RAR. The core idea of R2C is to perturb the multi-step reasoning process by applying various actions to reasoning steps. These perturbations alter the retriever's input, which shifts its output and consequently modifies the generator's input at the next step. Through this iterative feedback loop, the retriever and generator continuously reshape one another's inputs, enabling us to capture uncertainty arising from both components. Experiments on five popular RAR systems across diverse QA datasets show that R2C improves AUROC by over 5% on average compared to the state-of-the-art UQ baselines. Extrinsic evaluations using R2C as an external signal further confirm its effectiveness for two downstream tasks: in Abstention, it achieves ~5% gains in both F1Abstain and AccAbstain; in Model Selection, it improves the exact match by ~7% over single models and ~3% over selection methods.

Problem

Research questions and friction points this paper is trying to address.

Quantifying uncertainty in multi-step retrieval-augmented reasoning systems

Addressing error vulnerabilities in complex query reasoning processes

Developing consistent uncertainty estimation for retrieval-generation feedback loops

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces R2C method for uncertainty quantification

Perturbs reasoning steps to capture retrieval uncertainty

Iterative feedback reshapes retriever and generator inputs

🔎 Similar Papers

MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty

2024-08-13arXiv.orgCitations: 1

💼 Related Jobs

Researcher, Alignment

OpenAI

$250K – $445K • Offers Equity

San Francisco, CA, USA

Authors to Follow