QE-RAG: A Robust Retrieval-Augmented Generation Benchmark for Query Entry Errors

📅 2025-04-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing RAG methods perform well on clean user queries but lack robustness against common real-world input errors—such as keyboard proximity, visual similarity, and typographical mistakes—yet systematic evaluation and improvement mechanisms remain underexplored. To address this gap, we introduce QErrorBench, the first benchmark dedicated to evaluating RAG robustness under diverse query input errors, covering six datasets with controllable error injection rates (20% and 40%). Methodologically, we propose a contrastive learning–driven training paradigm for robust retrievers and design a retrieval-augmented query correction module. Our approach significantly enhances fault tolerance across sequential, branching, and iterative RAG architectures. Empirical results demonstrate consistent performance gains in both in-domain and cross-domain settings, without requiring modifications to existing RAG pipelines—ensuring full compatibility with current deployments.

Technology Category

Application Category

📝 Abstract
Retriever-augmented generation (RAG) has become a widely adopted approach for enhancing the factual accuracy of large language models (LLMs). While current benchmarks evaluate the performance of RAG methods from various perspectives, they share a common assumption that user queries used for retrieval are error-free. However, in real-world interactions between users and LLMs, query entry errors such as keyboard proximity errors, visual similarity errors, and spelling errors are frequent. The impact of these errors on current RAG methods against such errors remains largely unexplored. To bridge this gap, we propose QE-RAG, the first robust RAG benchmark designed specifically to evaluate performance against query entry errors. We augment six widely used datasets by injecting three common types of query entry errors into randomly selected user queries at rates of 20% and 40%, simulating typical user behavior in real-world scenarios. We analyze the impact of these errors on LLM outputs and find that corrupted queries degrade model performance, which can be mitigated through query correction and training a robust retriever for retrieving relevant documents. Based on these insights, we propose a contrastive learning-based robust retriever training method and a retrieval-augmented query correction method. Extensive in-domain and cross-domain experiments reveal that: (1) state-of-the-art RAG methods including sequential, branching, and iterative methods, exhibit poor robustness to query entry errors; (2) our method significantly enhances the robustness of RAG when handling query entry errors and it's compatible with existing RAG methods, further improving their robustness.
Problem

Research questions and friction points this paper is trying to address.

Evaluates RAG robustness against query entry errors
Proposes QE-RAG benchmark for error-prone queries
Enhances RAG robustness via correction and training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive learning-based robust retriever training
Retrieval-augmented query correction method
Benchmark with injected query entry errors
🔎 Similar Papers
No similar papers found.
Kepu Zhang
Kepu Zhang
Renmin University of China
SearchLLMRecommendationLegal AI
Zhongxiang Sun
Zhongxiang Sun
Renmin University of China
SearchRecommendationLLMLegal
W
Weijie Yu
School of Information Technology and Management, University of International Business and Economics, Beijing, China
Xiaoxue Zang
Xiaoxue Zang
Kuaishou Technology
Recommender SystemNLPDialogueMultimodal Modeling
K
Kai Zheng
Kuaishou Technology Co., Ltd., Beijing, China
Y
Yang Song
Kuaishou Technology Co., Ltd., Beijing, China
H
Han Li
Kuaishou Technology Co., Ltd., Beijing, China
J
Jun Xu
Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China