PoisonArena: Uncovering Competing Poisoning Attacks in Retrieval-Augmented Generation

📅 2025-05-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a realistic threat to retrieval-augmented generation (RAG) systems: competitive poisoning attacks by multiple adversaries, who inject mutually exclusive false information for the same query to争夺 control over the generated answer. We introduce PoisonArena—the first standardized benchmark for this multi-attacker setting—and formally define the competitive threat model. To evaluate attacker efficacy under competition, we propose a Bradley–Terry–based metric, revealing the inadequacy of conventional single-attacker metrics (e.g., attack success rate) in adversarial environments. Through RAG architecture analysis, multi-agent adversarial modeling, and empirical evaluation on Natural Questions and MS MARCO, we demonstrate that most poisoning strategies—effective in isolation—suffer significant degradation under competition. Our work establishes “competitive robustness” as a core principle for RAG security assessment and releases an open-source, standardized benchmark framework.

Technology Category

Application Category

📝 Abstract
Retrieval-Augmented Generation (RAG) systems, widely used to improve the factual grounding of large language models (LLMs), are increasingly vulnerable to poisoning attacks, where adversaries inject manipulated content into the retriever's corpus. While prior research has predominantly focused on single-attacker settings, real-world scenarios often involve multiple, competing attackers with conflicting objectives. In this work, we introduce PoisonArena, the first benchmark to systematically study and evaluate competing poisoning attacks in RAG. We formalize the multi-attacker threat model, where attackers vie to control the answer to the same query using mutually exclusive misinformation. PoisonArena leverages the Bradley-Terry model to quantify each method's competitive effectiveness in such adversarial environments. Through extensive experiments on the Natural Questions and MS MARCO datasets, we demonstrate that many attack strategies successful in isolation fail under competitive pressure. Our findings highlight the limitations of conventional evaluation metrics like Attack Success Rate (ASR) and F1 score and underscore the need for competitive evaluation to assess real-world attack robustness. PoisonArena provides a standardized framework to benchmark and develop future attack and defense strategies under more realistic, multi-adversary conditions. Project page: https://github.com/yxf203/PoisonArena.
Problem

Research questions and friction points this paper is trying to address.

Studying competing poisoning attacks in RAG systems
Evaluating attack robustness in multi-adversary scenarios
Developing benchmark for realistic attack and defense strategies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces PoisonArena for multi-attacker RAG poisoning
Uses Bradley-Terry model to quantify attack effectiveness
Benchmarks attacks under competitive adversarial conditions
Liuji Chen
Liuji Chen
Institute of Automation, Chinese Academy of Sciences
LLM AgentTrustworthy AI
X
Xiaofang Yang
Harbin Institute of Technology WeiHai, China
Y
Yuanzhuo Lu
Harbin Institute of Technology WeiHai, China
Jinghao Zhang
Jinghao Zhang
Kuaishou Tech
Recommender SystemsMultimediaLarge Language Model
X
Xin Sun
New Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences
Q
Qiang Liu
New Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences
S
Shu Wu
New Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences
J
Jing Dong
New Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences
L
Liang Wang
New Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences