Illusions of Relevance: Using Content Injection Attacks to Deceive Retrievers, Rerankers, and LLM Judges

📅 2025-01-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper reveals the severe vulnerability of neural information retrieval (NeuIR) systems—including retrieval, re-ranking, and large language model (LLM)-based relevance judgment modules—to “camouflaged-relevance” content injection attacks, wherein semantically coherent yet misleading non-keyword text is inserted to deceive models into misjudging document relevance. Method: We conduct the first empirical analysis of critical factors—such as positional sensitivity and semantic balance—and propose a holistic defense framework integrating adversarial sample construction, embedding perturbation analysis, retriever fine-tuning, LLM prompt optimization, and adversarial classifier training. Contribution/Results: Experiments demonstrate that state-of-the-art retrievers, re-rankers, and LLM judges are all susceptible; our defense significantly improves robustness. However, it also exposes an inherent trade-off between robustness enhancement and legitimate recall degradation in existing defenses, underscoring the urgency and complexity of building trustworthy IR systems.

Technology Category

Application Category

📝 Abstract
Consider a scenario in which a user searches for information, only to encounter texts flooded with misleading or non-relevant content. This scenario exemplifies a simple yet potent vulnerability in neural Information Retrieval (IR) pipelines: content injection attacks. We find that embedding models for retrieval, rerankers, and large language model (LLM) relevance judges are vulnerable to these attacks, in which adversaries insert misleading text into passages to manipulate model judgements. We identify two primary threats: (1) inserting unrelated or harmful content within passages that still appear deceptively"relevant", and (2) inserting entire queries or key query terms into passages to boost their perceived relevance. While the second tactic has been explored in prior research, we present, to our knowledge, the first empirical analysis of the first threat, demonstrating how state-of-the-art models can be easily misled. Our study systematically examines the factors that influence an attack's success, such as the placement of injected content and the balance between relevant and non-relevant material. Additionally, we explore various defense strategies, including adversarial passage classifiers, retriever fine-tuning to discount manipulated content, and prompting LLM judges to adopt a more cautious approach. However, we find that these countermeasures often involve trade-offs, sacrificing effectiveness for attack robustness and sometimes penalizing legitimate documents in the process. Our findings highlight the need for stronger defenses against these evolving adversarial strategies to maintain the trustworthiness of IR systems. We release our code and scripts to facilitate further research.
Problem

Research questions and friction points this paper is trying to address.

Information Retrieval
Content Injection Attack
Malicious Users
Innovation

Methods, ideas, or system contributions that make the work stand out.

Content Injection Attack
Information Retrieval Security
Misleading Information Impact
🔎 Similar Papers
No similar papers found.