🤖 AI Summary
This paper identifies a novel poisoning attack threat against multimodal Retrieval-Augmented Generation (M-RAG) systems in visual document retrieval: an attacker can trigger a universal Denial-of-Service (DoS) effect by injecting a single adversarial image—causing it to be erroneously retrieved as relevant for diverse queries and dominating downstream generation.
Method: We propose the first gradient-based, cross-model, single-image universal poisoning method, compatible with mainstream vision encoders (CLIP, SigLIP) and multimodal large language models (MLLMs) (e.g., LLaVA, Qwen-VL).
Contribution/Results: Our attack achieves >92% average retrieval misdirection across multiple state-of-the-art M-RAG systems. Crucially, we find that robustly trained embedding models inherently resist such attacks—reducing success rates to <5%—thereby exposing a fundamental architectural vulnerability in current M-RAG designs and revealing a promising defense direction via embedding robustness.
📝 Abstract
Multimodal retrieval augmented generation (M-RAG) has recently emerged as a method to inhibit hallucinations of large multimodal models (LMMs) through a factual knowledge base (KB). However, M-RAG also introduces new attack vectors for adversaries that aim to disrupt the system by injecting malicious entries into the KB. In this work, we present a poisoning attack against M-RAG targeting visual document retrieval applications, where the KB contains images of document pages. Our objective is to craft a single image that is retrieved for a variety of different user queries, and consistently influences the output produced by the generative model, thus creating a universal denial-of-service (DoS) attack against the M-RAG system. We demonstrate that while our attack is effective against a diverse range of widely-used, state-of-the-art retrievers (embedding models) and generators (LMMs), it can also be ineffective against robust embedding models. Our attack not only highlights the vulnerability of M-RAG pipelines to poisoning attacks, but also sheds light on a fundamental weakness that potentially hinders their performance even in benign settings.