On the Feasibility of Hijacking MLLMs' Decision Chain via One Perturbation

📅 2025-11-25

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Traditional adversarial attacks target only single-step decisions, making them ineffective at inducing cascading failures across multi-step reasoning chains. This work introduces the novel threat of “decision chain hijacking”: a single perturbation that simultaneously manipulates multiple downstream outputs of multimodal large language models (MLLMs)—e.g., misclassifying “bicycle lane” as “motor vehicle lane” while misidentifying “pedestrian” as “plastic bag.” To address this, we propose Semantic-Aware Universal Perturbations (SAUPs), integrating semantic guidance, normalized spatial search, and target-decoupling optimization to enable synchronized control over five distinct output categories via a single-frame perturbation. Evaluated on our newly constructed real-world dataset RIST, SAUPs achieve an average attack success rate of 70% across three state-of-the-art MLLMs. Our results expose a previously unrecognized systemic security vulnerability in MLLMs—namely, their susceptibility to targeted corruption of extended, interdependent reasoning chains.

Technology Category

Application Category

📝 Abstract

Conventional adversarial attacks focus on manipulating a single decision of neural networks. However, real-world models often operate in a sequence of decisions, where an isolated mistake can be easily corrected, but cascading errors can lead to severe risks. This paper reveals a novel threat: a single perturbation can hijack the whole decision chain. We demonstrate the feasibility of manipulating a model's outputs toward multiple, predefined outcomes, such as simultaneously misclassifying "non-motorized lane" signs as "motorized lane" and "pedestrian" as "plastic bag". To expose this threat, we introduce Semantic-Aware Universal Perturbations (SAUPs), which induce varied outcomes based on the semantics of the inputs. We overcome optimization challenges by developing an effective algorithm, which searches for perturbations in normalized space with a semantic separation strategy. To evaluate the practical threat of SAUPs, we present RIST, a new real-world image dataset with fine-grained semantic annotations. Extensive experiments on three multimodal large language models demonstrate their vulnerability, achieving a 70% attack success rate when controlling five distinct targets using just an adversarial frame.

Problem

Research questions and friction points this paper is trying to address.

Hijacking multimodal models' decision chains with single perturbation

Manipulating model outputs toward multiple predefined incorrect outcomes

Demonstrating vulnerability of MLLMs to cascading semantic errors

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic-Aware Universal Perturbations hijack decision chains

Algorithm searches perturbations with semantic separation strategy

Single perturbation controls multiple predefined semantic outcomes

🔎 Similar Papers

No similar papers found.

Authors to Follow