🤖 AI Summary
This work addresses the security threat posed by corpus poisoning attacks in retrieval-augmented generation (RAG) systems by proposing ProGRank, a training-free and deployment-friendly defense mechanism operating at the retrieval stage. ProGRank introduces probe gradients—a novel concept in RAG security—by applying random perturbations to query-document pairs and leveraging two stability signals, representation consistency and discrete risk, to construct a re-ranking strategy that identifies and filters poisoned documents. The method requires no modification of original content or model retraining and supports the use of proxy models when the original retriever is inaccessible. Extensive experiments demonstrate that ProGRank significantly enhances robustness across three datasets, three dense retrievers, and multiple poisoning attacks, achieving superior utility–security trade-offs in both retrieval-only and end-to-end settings while remaining effective against adaptive adversaries.
📝 Abstract
Retrieval-Augmented Generation (RAG) improves the reliability of large language model applications by grounding generation in retrieved evidence, but it also introduces a new attack surface: corpus poisoning. In this setting, an adversary injects or edits passages so that they are ranked into the Top-$K$ results for target queries and then affect downstream generation. Existing defences against corpus poisoning often rely on content filtering, auxiliary models, or generator-side reasoning, which can make deployment more difficult. We propose ProGRank, a post hoc, training-free retriever-side defence for dense-retriever RAG. ProGRank stress-tests each query--passage pair under mild randomized perturbations and extracts probe gradients from a small fixed parameter subset of the retriever. From these signals, it derives two instability signals, representational consistency and dispersion risk, and combines them with a score gate in a reranking step. ProGRank preserves the original passage content, requires no retraining, and also supports a surrogate-based variant when the deployed retriever is unavailable. Extensive experiments across three datasets, three dense retriever backbones, representative corpus poisoning attacks, and both retrieval-stage and end-to-end settings show that ProGRank provides stronger defence performance and a favorable robustness--utility trade-off. It also remains competitive under adaptive evasive attacks.