🤖 AI Summary
This work addresses the lack of evaluation frameworks for assessing how retrieval-augmented generation (RAG) systems adapt following user or expert feedback. It introduces, for the first time, a “feedback adaptation” problem setting, quantifying adaptation speed and reliability through two metrics: correction latency and post-feedback performance. To enable real-time feedback integration without retraining during inference, the authors propose PatchRAG, which combines semantic relevance analysis with behavioral change detection to achieve zero-latency corrections and cross-query semantic generalization. Experimental results demonstrate that PatchRAG significantly outperforms baseline methods, maintaining immediate responsiveness while exhibiting strong generalization capabilities after receiving feedback.
📝 Abstract
Retrieval-Augmented Generation (RAG) systems are typically evaluated under static assumptions, despite being frequently corrected through user or expert feedback in deployment. Existing evaluation protocols focus on overall accuracy and fail to capture how systems adapt after feedback is introduced. We introduce feedback adaptation as a problem setting for RAG systems, which asks how effectively and how quickly corrective feedback propagates to future queries. To make this behavior measurable, we propose two evaluation axes: correction lag, which captures the delay between feedback provision and behavioral change, and post-feedback performance, which measures reliability on semantically related queries after feedback. Using these metrics, we show that training-based approaches exhibit a trade-off between delayed correction and reliable adaptation. We further propose PatchRAG, a minimal inference-time instantiation that incorporates feedback without retraining, demonstrating immediate correction and strong post-feedback generalization under the proposed evaluation. Our results highlight feedback adaptation as a previously overlooked dimension of RAG system behavior in interactive settings.