Principled Context Engineering for RAG: Statistical Guarantees via Conformal Prediction

๐Ÿ“… 2025-11-21
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
In Retrieval-Augmented Generation (RAG), long and noisy retrieved contexts often exceed the effective attention capacity of large language models (LLMs), degrading generation quality; existing pre-filtering methods rely on heuristic rules or uncalibrated confidence scores, lacking statistical guarantees. This paper proposes the first conformal prediction framework for RAG context filtering, jointly leveraging embedding similarity and LLM-based scoring functions to enable multi-granularity relevance assessment and redundancy pruningโ€”ensuring statistically reliable evidence retention under controllable coverage. The method is model-agnostic, theoretically rigorous, and requires no fine-tuning. Experiments on NeuCLIR and RAGTIME demonstrate that, after compressing contexts by 2โ€“3ร—, target coverage remains stable while ARGUE F1 improves significantly under strict filtering, validating effective removal of irrelevant and redundant content.

Technology Category

Application Category

๐Ÿ“ Abstract
Retrieval-Augmented Generation (RAG) enhances factual grounding in large language models (LLMs) by incorporating retrieved evidence, but LLM accuracy declines when long or noisy contexts exceed the model's effective attention span. Existing pre-generation filters rely on heuristics or uncalibrated LLM confidence scores, offering no statistical control over retained evidence. We evaluate and demonstrate context engineering through conformal prediction, a coverage-controlled filtering framework that removes irrelevant content while preserving recall of supporting evidence. Using both embedding- and LLM-based scoring functions, we test this approach on the NeuCLIR and RAGTIME collections. Conformal filtering consistently meets its target coverage, ensuring that a specified fraction of relevant snippets are retained, and reduces retained context by 2-3x relative to unfiltered retrieval. On NeuCLIR, downstream factual accuracy measured by ARGUE F1 improves under strict filtering and remains stable at moderate coverage, indicating that most discarded material is redundant or irrelevant. These results demonstrate that conformal prediction enables reliable, coverage-controlled context reduction in RAG, offering a model-agnostic and principled approach to context engineering.
Problem

Research questions and friction points this paper is trying to address.

Addresses accuracy decline in LLMs from long noisy contexts
Provides statistical guarantees for evidence retention in RAG systems
Enables coverage-controlled context reduction while preserving relevant information
Innovation

Methods, ideas, or system contributions that make the work stand out.

Conformal prediction enables coverage-controlled context filtering
Reduces retained context by 2-3x while preserving recall
Model-agnostic framework using embedding- and LLM-based scoring
๐Ÿ”Ž Similar Papers
No similar papers found.