MIRAGE: Misleading Retrieval-Augmented Generation via Black-box and Query-agnostic Poisoning Attacks

📅 2025-12-09

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses corpus poisoning attacks against retrieval-augmented generation (RAG) systems under realistic black-box and query-agnostic settings. We propose MIRAGE, a novel multi-stage poisoning framework that eliminates white-box assumptions and prior query knowledge. MIRAGE introduces an end-to-end pipeline integrating surrogate model feedback, role-driven query synthesis, semantic anchoring, and test-time preference optimization (TPO) for adversarial querying. It significantly enhances attack stealthiness and cross-model transferability. Evaluated on three long-document benchmarks, MIRAGE outperforms all existing baselines—achieving higher attack success rates, smaller input perturbations, and requiring no access to target model parameters or real user queries. To our knowledge, this is the first work demonstrating efficient, practical, and generalizable RAG corpus poisoning under strict real-world constraints, underscoring the urgent need for robust defensive mechanisms.

Technology Category

Application Category

📝 Abstract

Retrieval-Augmented Generation (RAG) systems enhance LLMs with external knowledge but introduce a critical attack surface: corpus poisoning. While recent studies have demonstrated the potential of such attacks, they typically rely on impractical assumptions, such as white-box access or known user queries, thereby underestimating the difficulty of real-world exploitation. In this paper, we bridge this gap by proposing MIRAGE, a novel multi-stage poisoning pipeline designed for strict black-box and query-agnostic environments. Operating on surrogate model feedback, MIRAGE functions as an automated optimization framework that integrates three key mechanisms: it utilizes persona-driven query synthesis to approximate latent user search distributions, employs semantic anchoring to imperceptibly embed these intents for high retrieval visibility, and leverages an adversarial variant of Test-Time Preference Optimization (TPO) to maximize persuasion. To rigorously evaluate this threat, we construct a new benchmark derived from three long-form, domain-specific datasets. Extensive experiments demonstrate that MIRAGE significantly outperforms existing baselines in both attack efficacy and stealthiness, exhibiting remarkable transferability across diverse retriever-LLM configurations and highlighting the urgent need for robust defense strategies.

Problem

Research questions and friction points this paper is trying to address.

Corpus poisoning attacks on RAG systems

Black-box and query-agnostic poisoning pipeline

Evaluating attack efficacy and stealthiness across configurations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Black-box query-agnostic poisoning pipeline

Persona-driven query synthesis for intent approximation

Adversarial test-time preference optimization for persuasion

🔎 Similar Papers

On the Vulnerability of Applying Retrieval-Augmented Generation within Knowledge-Intensive Application Domains