TrustRAG: Enhancing Robustness and Trustworthiness in RAG

📅 2025-01-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
RAG systems are vulnerable to corpus poisoning attacks, which corrupt retrieved content—yielding erroneous or irrelevant passages—and thereby degrade response accuracy and security. To address this, we propose a plug-and-play, training-free, two-stage robust filtering framework. In Stage I, K-means clustering is applied to retrieved passages for semantic grouping, followed by removal of outlier clusters likely containing malicious content. In Stage II, a hybrid scoring mechanism integrates cosine similarity with ROUGE-L–based self-assessment and content consistency verification to further prune irrelevant items. The method is model-agnostic—compatible with any open- or closed-source LLM—without requiring fine-tuning. Extensive experiments across multiple LLMs (e.g., Llama-3, Qwen) and benchmarks (Poisoned-RAG, RAGBench) demonstrate substantial improvements: +18.7% retrieval accuracy, 23% reduction in inference latency, and suppression of attack success rate to <5%. Our implementation is publicly available.

Technology Category

Application Category

📝 Abstract
Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by integrating external knowledge sources, enabling more accurate and contextually relevant responses tailored to user queries. However, these systems remain vulnerable to corpus poisoning attacks that can significantly degrade LLM performance through the injection of malicious content. To address these challenges, we propose TrustRAG, a robust framework that systematically filters compromised and irrelevant content before it reaches the language model. Our approach implements a two-stage defense mechanism: first, it employs K-means clustering to identify potential attack patterns in retrieved documents based on their semantic embeddings, effectively isolating suspicious content. Second, it leverages cosine similarity and ROUGE metrics to detect malicious documents while resolving discrepancies between the model's internal knowledge and external information through a self-assessment process. TrustRAG functions as a plug-and-play, training-free module that integrates seamlessly with any language model, whether open or closed-source, maintaining high contextual relevance while strengthening defenses against attacks. Through extensive experimental validation, we demonstrate that TrustRAG delivers substantial improvements in retrieval accuracy, efficiency, and attack resistance compared to existing approaches across multiple model architectures and datasets. We have made TrustRAG available as open-source software at url{https://github.com/HuichiZhou/TrustRAG}.
Problem

Research questions and friction points this paper is trying to address.

RAG systems
external knowledge errors
information accuracy and security
Innovation

Methods, ideas, or system contributions that make the work stand out.

TrustRAG
Information Filtering
Enhanced Security
🔎 Similar Papers
No similar papers found.
Huichi Zhou
Huichi Zhou
University College London
AI4Science
K
Kin-Hei Lee
Imperial College London, London, UK
Zhonghao Zhan
Zhonghao Zhan
Cornell University
NetworksHuman-Computer InteractionData Mining
Y
Yue Chen
Peking University, China
Z
Zhenhao Li
Imperial College London, London, UK