🤖 AI Summary
This work addresses the fragility of existing retrieval-augmented generation (RAG) systems under noisy retrieval, where models struggle to select high-quality evidence segments that align with the generator’s reasoning capabilities—even when correct evidence is present. To this end, we propose BAR-RAG, the first approach that explicitly aligns evidence selection with the generator’s capability boundary by introducing a boundary-aware reranker. This reranker prioritizes “just-right” evidence—sufficiently informative to support reasoning yet neither overly simplistic nor unanswerable. BAR-RAG employs generator feedback-driven reinforcement learning and a two-stage fine-tuning strategy to mitigate train–test distribution shifts. Evaluated across multiple knowledge-intensive question answering benchmarks, BAR-RAG achieves an average improvement of 10.3% over current RAG and reranking methods, demonstrating significantly enhanced robustness in noisy retrieval settings.
📝 Abstract
Retrieval-Augmented Generation (RAG) systems remain brittle under realistic retrieval noise, even when the required evidence appears in the top-K results. A key reason is that retrievers and rerankers optimize solely for relevance, often selecting either trivial, answer-revealing passages or evidence that lacks the critical information required to answer the question, without considering whether the evidence is suitable for the generator. We propose BAR-RAG, which reframes the reranker as a boundary-aware evidence selector that targets the generator's Goldilocks Zone -- evidence that is neither trivially easy nor fundamentally unanswerable for the generator, but is challenging yet sufficient for inference and thus provides the strongest learning signal. BAR-RAG trains the selector with reinforcement learning using generator feedback, and adopts a two-stage pipeline that fine-tunes the generator under the induced evidence distribution to mitigate the distribution mismatch between training and inference. Experiments on knowledge-intensive question answering benchmarks show that BAR-RAG consistently improves end-to-end performance under noisy retrieval, achieving an average gain of 10.3 percent over strong RAG and reranking baselines while substantially improving robustness. Code is publicly avaliable at https://github.com/GasolSun36/BAR-RAG.