On the Feasibility of Hijacking MLLMs' Decision Chain via One Perturbation

📅 2025-11-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional adversarial attacks target only single-step decisions, making them ineffective at inducing cascading failures across multi-step reasoning chains. This work introduces the novel threat of “decision chain hijacking”: a single perturbation that simultaneously manipulates multiple downstream outputs of multimodal large language models (MLLMs)—e.g., misclassifying “bicycle lane” as “motor vehicle lane” while misidentifying “pedestrian” as “plastic bag.” To address this, we propose Semantic-Aware Universal Perturbations (SAUPs), integrating semantic guidance, normalized spatial search, and target-decoupling optimization to enable synchronized control over five distinct output categories via a single-frame perturbation. Evaluated on our newly constructed real-world dataset RIST, SAUPs achieve an average attack success rate of 70% across three state-of-the-art MLLMs. Our results expose a previously unrecognized systemic security vulnerability in MLLMs—namely, their susceptibility to targeted corruption of extended, interdependent reasoning chains.

Technology Category

Application Category

📝 Abstract
Conventional adversarial attacks focus on manipulating a single decision of neural networks. However, real-world models often operate in a sequence of decisions, where an isolated mistake can be easily corrected, but cascading errors can lead to severe risks. This paper reveals a novel threat: a single perturbation can hijack the whole decision chain. We demonstrate the feasibility of manipulating a model's outputs toward multiple, predefined outcomes, such as simultaneously misclassifying "non-motorized lane" signs as "motorized lane" and "pedestrian" as "plastic bag". To expose this threat, we introduce Semantic-Aware Universal Perturbations (SAUPs), which induce varied outcomes based on the semantics of the inputs. We overcome optimization challenges by developing an effective algorithm, which searches for perturbations in normalized space with a semantic separation strategy. To evaluate the practical threat of SAUPs, we present RIST, a new real-world image dataset with fine-grained semantic annotations. Extensive experiments on three multimodal large language models demonstrate their vulnerability, achieving a 70% attack success rate when controlling five distinct targets using just an adversarial frame.
Problem

Research questions and friction points this paper is trying to address.

Hijacking multimodal models' decision chains with single perturbation
Manipulating model outputs toward multiple predefined incorrect outcomes
Demonstrating vulnerability of MLLMs to cascading semantic errors
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic-Aware Universal Perturbations hijack decision chains
Algorithm searches perturbations with semantic separation strategy
Single perturbation controls multiple predefined semantic outcomes
🔎 Similar Papers
No similar papers found.
C
Changyue Li
The Chinese University of Hong Kong, Shenzhen
J
Jiaying Li
The Chinese University of Hong Kong, Shenzhen
Y
Youliang Yuan
The Chinese University of Hong Kong, Shenzhen
J
Jiaming He
University of Electronic Science and Technology of China
Zhicong Huang
Zhicong Huang
Ant Group
CryptographySecurity and PrivacyMachine Learning
Pinjia He
Pinjia He
Assistant Professor, The Chinese University of Hong Kong, Shenzhen
Software EngineeringAI4SESE4AIAIOps