QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation

📅 2025-12-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing dynamic retrieval-augmented generation (RAG) methods rely on internal LLM confidence signals—e.g., logits or entropy—to trigger retrieval; however, these signals suffer from poor calibration and unreliability, frequently inducing hallucinations. This paper proposes *Corpus-Grounded Dynamic RAG*, a novel paradigm that abandons model-internal signals entirely and instead quantifies uncertainty objectively via pre-trained corpus statistics—specifically, entity frequency and co-occurrence patterns—enabling model-agnostic, adaptive retrieval triggering. Key technical contributions include: (1) Infini-gram indexing for sub-millisecond queries over 4-trillion-token corpora; (2) long-tail entity identification; (3) co-occurrence-based verification; and (4) a dynamic gating mechanism. On multi-hop QA, our method improves exact match (EM) by 5–12 points on OLMo-2; achieves up to +14 EM gain when transferred across diverse models (Llama, Qwen, GPT); and demonstrates robust generalization in biomedical domains.

Technology Category

Application Category

📝 Abstract
Dynamic Retrieval-Augmented Generation adaptively determines when to retrieve during generation to mitigate hallucinations in large language models (LLMs). However, existing methods rely on model-internal signals (e.g., logits, entropy), which are fundamentally unreliable because LLMs are typically ill-calibrated and often exhibit high confidence in erroneous outputs. We propose QuCo-RAG, which shifts from subjective confidence to objective statistics computed from pre-training data. Our method quantifies uncertainty through two stages: (1) before generation, we identify low-frequency entities indicating long-tail knowledge gaps; (2) during generation, we verify entity co-occurrence in the pre-training corpus, where zero co-occurrence often signals hallucination risk. Both stages leverage Infini-gram for millisecond-latency queries over 4 trillion tokens, triggering retrieval when uncertainty is high. Experiments on multi-hop QA benchmarks show QuCo-RAG achieves EM gains of 5--12 points over state-of-the-art baselines with OLMo-2 models, and transfers effectively to models with undisclosed pre-training data (Llama, Qwen, GPT), improving EM by up to 14 points. Domain generalization on biomedical QA further validates the robustness of our paradigm. These results establish corpus-grounded verification as a principled, practically model-agnostic paradigm for dynamic RAG. Our code is publicly available at https://github.com/ZhishanQ/QuCo-RAG.
Problem

Research questions and friction points this paper is trying to address.

Mitigating hallucinations in large language models during generation
Replacing unreliable internal confidence signals with objective data statistics
Detecting knowledge gaps through pre-training corpus frequency analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Shifts from subjective confidence to objective pre-training statistics
Quantifies uncertainty via low-frequency entities and co-occurrence verification
Uses Infini-gram for millisecond queries over 4 trillion tokens
🔎 Similar Papers
No similar papers found.