🤖 AI Summary
This work addresses a critical security vulnerability in multi-turn vision-language dialogue systems, where existing models are susceptible to stealthy backdoor attacks that compromise user safety. The paper introduces Visual Memory Injection (VMI), the first attack method of its kind, which embeds adversarial perturbations into uploaded images to implant malicious payloads into the model’s memory. These payloads remain dormant until activated in later conversation rounds by specific trigger prompts, enabling targeted manipulation of user interactions. Unlike prior single-turn attacks, VMI exhibits strong persistence and high concealment across extended dialogues. Extensive experiments demonstrate its effectiveness across multiple mainstream open-source vision-language models, revealing previously unexplored security risks inherent in long-context, multi-turn interactive scenarios.
📝 Abstract
Generative large vision-language models (LVLMs) have recently achieved impressive performance gains, and their user base is growing rapidly. However, the security of LVLMs, in particular in a long-context multi-turn setting, is largely underexplored. In this paper, we consider the realistic scenario in which an attacker uploads a manipulated image to the web/social media. A benign user downloads this image and uses it as input to the LVLM. Our novel stealthy Visual Memory Injection (VMI) attack is designed such that on normal prompts the LVLM exhibits nominal behavior, but once the user gives a triggering prompt, the LVLM outputs a specific prescribed target message to manipulate the user, e.g. for adversarial marketing or political persuasion. Compared to previous work that focused on single-turn attacks, VMI is effective even after a long multi-turn conversation with the user. We demonstrate our attack on several recent open-weight LVLMs. This article thereby shows that large-scale manipulation of users is feasible with perturbed images in multi-turn conversation settings, calling for better robustness of LVLMs against these attacks. We release the source code at https://github.com/chs20/visual-memory-injection