MIDAS: Multi-Image Dispersion and Semantic Reconstruction for Jailbreaking MLLMs

📅 2026-02-28

📈 Citations: 0

✨ Influential: 0

career value

258K/year

🤖 AI Summary

This work addresses the vulnerability of multimodal large language models (MLLMs) to jailbreak attacks, noting that existing approaches—relying on single-image masking or isolated visual cues—struggle to compromise strongly aligned, closed-source commercial models. To overcome this limitation, the authors propose MIDAS, a novel framework that introduces multi-image dispersion and semantic reconstruction. Specifically, harmful semantics are decomposed into risk subunits and distributed across multiple images; malicious intent is then incrementally reconstructed through cross-image visual cue fusion and chain-of-thought reasoning. This strategy significantly delays semantic exposure and evades textual safety detectors. Evaluated on four leading closed-source MLLMs, MIDAS achieves an average attack success rate of 81.46%, substantially outperforming current state-of-the-art methods.

Technology Category

Application Category

📝 Abstract

Multimodal Large Language Models (MLLMs) have achieved remarkable performance but remain vulnerable to jailbreak attacks that can induce harmful content and undermine their secure deployment. Previous studies have shown that introducing additional inference steps, which disrupt security attention, can make MLLMs more susceptible to being misled into generating malicious content. However, these methods rely on single-image masking or isolated visual cues, which only modestly extend reasoning paths and thus achieve limited effectiveness, particularly against strongly aligned commercial closed-source models. To address this problem, in this paper, we propose Multi-Image Dispersion and Semantic Reconstruction (MIDAS), a multimodal jailbreak framework that decomposes harmful semantics into risk-bearing subunits, disperses them across multiple visual clues, and leverages cross-image reasoning to gradually reconstruct the malicious intent, thereby bypassing existing safety mechanisms. The proposed MIDAS enforces longer and more structured multi-image chained reasoning, substantially increases the model's reliance on visual cues while delaying the exposure of malicious semantics and significantly reducing the model's security attention, thereby improving the performance of jailbreak against advanced MLLMs. Extensive experiments across different datasets and MLLMs demonstrate that the proposed MIDAS outperforms state-of-the-art jailbreak attacks for MLLMs and achieves an average attack success rate of 81.46% across 4 closed-source MLLMs. Our code is available at this [link](https://github.com/Winnie-Lian/MIDAS).

Problem

Research questions and friction points this paper is trying to address.

jailbreak attacks

Multimodal Large Language Models

security attention

harmful content

visual cues

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-image dispersion

semantic reconstruction

multimodal jailbreak