Attention Hijacking: Response Manipulation Across Queries in Vision-Language Models

📅 2026-05-17

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This work addresses the significant performance degradation of existing adversarial attacks on vision-language models under varying textual queries, which hinders stable cross-query response manipulation. The authors propose a novel attack method that, for the first time, links attention mechanism stability with cross-query attack efficacy. By explicitly guiding and hijacking the model’s internal attention distribution, the approach amplifies the influence of image tokens on the target response while suppressing interference from text tokens, thereby preserving an image-dominant attention pattern. Evaluated on mainstream vision-language models, this method substantially improves cross-query transferability and demonstrates strong robustness against unseen queries and diverse target responses, overcoming the conventional reliance on query-specific attack formulations.

📝 Abstract

Existing adversarial attacks on vision-language models (VLMs) can steer model outputs toward attacker-specified target responses, but their effectiveness often degrades when the same perturbed input is paired with different textual queries. This paper studies cross-query response manipulation, where a single adversarial example is expected to remain effective across diverse user queries. We first analyze the limitations of existing attacks and find that successful transfer is closely associated with preserving an image-dominant attention pattern during response generation. Motivated by the observation, we propose \textbf{Attention Hijacking}, a novel adversarial attack that explicitly steers internal attention distributions toward a persistent image-dominant pattern. By amplifying the influence of visual tokens on target response tokens while suppressing the competing influence of textual tokens, our method reduces the dependence of the manipulated output on the specific wording of the query. Extensive experiments on widely used VLMs show that Attention Hijacking substantially improves cross-query transferability across diverse target responses and unseen queries. The method also extends effectively to multiple attack scenarios, offering new insights into the role of attention stability in transferable response manipulation for VLMs.

Problem

Research questions and friction points this paper is trying to address.

cross-query response manipulation

vision-language models

adversarial attacks

attention pattern

transferability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Attention Hijacking

cross-query transferability

vision-language models