🤖 AI Summary
This work identifies a critical vulnerability of vision-language model (VLM)-based dense document retrievers—such as DSE and ColPali—to pixel-level adversarial attacks in screenshot-based retrieval. We systematically discover and quantify a previously unreported pixel poisoning vulnerability at the visual input interface, proposing three single-image injection-based pixel poisoning methods. Experiments demonstrate that injecting just one malicious screenshot suffices to poison 41.9% and 26.4% of top-10 retrieval results for DSE and ColPali, respectively; under targeted attacks, success rates reach 100%, substantially exceeding the robustness of text-only retrievers. To our knowledge, this is the first study to extend adversarial robustness evaluation into the pixel space of cross-modal dense retrieval. Our work establishes a foundational benchmark and delivers a critical security warning for VLM-driven document search systems.
📝 Abstract
Recent advancements in dense retrieval have introduced vision-language model (VLM)-based retrievers, such as DSE and ColPali, which leverage document screenshots embedded as vectors to enable effective search and offer a simplified pipeline over traditional text-only methods. In this study, we propose three pixel poisoning attack methods designed to compromise VLM-based retrievers and evaluate their effectiveness under various attack settings and parameter configurations. Our empirical results demonstrate that injecting even a single adversarial screenshot into the retrieval corpus can significantly disrupt search results, poisoning the top-10 retrieved documents for 41.9% of queries in the case of DSE and 26.4% for ColPali. These vulnerability rates notably exceed those observed with equivalent attacks on text-only retrievers. Moreover, when targeting a small set of known queries, the attack success rate raises, achieving complete success in certain cases. By exposing the vulnerabilities inherent in vision-language models, this work highlights the potential risks associated with their deployment.