StructBreak: Structural Cognitive Overload-Induced Safety Failures in MLLMs

📅 2026-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a critical vulnerability in multimodal large language models (MLLMs), wherein structural cognitive overload (SCO) during structured consistency reasoning can bypass safety mechanisms and induce harmful outputs. The authors propose StructBreak, a novel framework that, for the first time, identifies SCO as a high-level cognitive attack vector and establishes a black-box attack paradigm requiring no internal model access. By integrating interpretability techniques—such as attention dynamics, latent space topology, and geometric analysis—StructBreak automatically generates structural perturbations to exploit security flaws. Evaluated across six state-of-the-art MLLMs, the method achieves an average attack success rate of 92% (reaching 97% on Gemini 2.5). Furthermore, the study introduces a comprehensive benchmark encompassing ten threat scenarios, demonstrating that current alignment strategies remain inadequate against safety challenges emerging from complex multimodal reasoning.
📝 Abstract
Multimodal Large Language Models (MLLMs) excel at structural reasoning yet suffer from a sharp logical brittleness in structural consistency. We term this phenomenon Structural Cognitive Overload (SCO), a byproduct of the contention between deep reasoning and safety alignment. However, prior work has predominantly targeted typographic and pixel-level perturbations, leaving the study of SCO largely unexplored. To this end, we propose StructBreak, an automated end-to-end framework designed to quantify SCO. By leveraging StructBreak, we uncover a novel higher-order cognitive overload attack paradigm; notably, this attack operates under a practical black-box setting, requiring no internal model access. Consequently, we utilize this framework to establish a comprehensive benchmark spanning ten diverse threat scenarios. Empirical evaluations on six leading MLLMs reveal that SCO readily triggers toxic generation, yielding a 92% average ASR (up to 97% on Gemini 2.5). To elucidate the mechanism of SCO, we further conduct model-level interpretations spanning attention dynamics, latent space topology, and geometric analysis. Our findings reveal that StructBreak acts as a novel structural channel to circumvent safety filters. Furthermore, the limited efficacy of inherent safety mechanisms underscores that current alignment paradigms are insufficient for the era of complex multimodal reasoning.
Problem

Research questions and friction points this paper is trying to address.

Structural Cognitive Overload
Multimodal Large Language Models
Safety Failures
Structural Consistency
Cognitive Overload
Innovation

Methods, ideas, or system contributions that make the work stand out.

Structural Cognitive Overload
Multimodal Large Language Models
Safety Alignment
Black-box Attack
StructBreak
🔎 Similar Papers
Y
Yang Luo
Key Laboratory of Trustworthy Distributed Computing and Service (MoE), Beijing University of Posts and Telecommunications
Xinran Liu
Xinran Liu
Ph.D. candidate, Vanderbilt University
optimal transportmachine learning
T
Tiantian Ji
Key Laboratory of Trustworthy Distributed Computing and Service (MoE), Beijing University of Posts and Telecommunications
Z
Zhiyi Yin
Institute of Computing Technology, Chinese Academy of Sciences
L
Lingyun Peng
Key Laboratory of Trustworthy Distributed Computing and Service (MoE), Beijing University of Posts and Telecommunications
S
Shuyu Li
Key Laboratory of Trustworthy Distributed Computing and Service (MoE), Beijing University of Posts and Telecommunications