🤖 AI Summary
This work addresses underexplored security vulnerabilities in multimodal multi-agent systems (MM-MAS) during complex collaborative tasks, where existing adversarial attacks are largely confined to single-agent or single-modality settings. The authors propose HAM³, a hierarchical adversarial attack framework specifically designed for MM-MAS, which systematically disrupts the entire perception-to-decision pipeline by jointly perturbing the perception layer (visual and textual inputs), the communication layer (message content and topology), and the reasoning layer (cognitive pathways). Experiments on the GQA benchmark demonstrate that HAM³ achieves up to a 78.3% attack success rate against prominent multi-agent paradigms—including ReAct, Plan-and-Solve, and Reflexion—exposing novel vulnerabilities in cross-modal fusion and collaborative reasoning. Notably, the attack can induce multiple agents to converge on consistent erroneous conclusions.
📝 Abstract
Multi-modal multi-agent systems (MM-MAS) have gained increasing attention for their capacity to enable complex reasoning and coordination across diverse modalities. As these systems continue to expand in scale and functionality, investigating their potential vulnerabilities has become increasingly important. However, existing studies on adversarial attacks in multi-agent systems primarily focus on isolated agents or unimodal settings, leaving the vulnerabilities of MM-MAS largely underexplored. To bridge this gap, we introduce HAM$^{3}$, a Hierarchical Attack framework for multi-modal multi-agent systems that decomposes attacks into three interconnected layers. Specifically, at the perception layer, HAM$^{3}$ mounts attacks by perturbing visual inputs, textual inputs, and their fused visual-textual representations. At the communication layer, it performs communication-level attacks that corrupt message content and interaction topology, such as manipulating shared context or communication links to distort collective information flow. At the reasoning layer, it conducts reasoning-level attacks that interfere with each agent's cognitive pipeline, biasing reasoning trajectories and ultimately compromising final decisions. We evaluate HAM$^{3}$ on the GQA benchmark through multi-agent systems built on distinct reasoning paradigms including ReAct, Plan-and-Solve, and Reflexion. Experiments demonstrate that our framework achieves an Attack Success Rate of up to 78.3%, with reasoning-layer attacks being the most effective. More than half of the successful attacks lead multiple agents to produce consistent errors. These findings offer valuable insights for building more robust and interpretable multi-agent intelligence.