GPO-V: Jailbreak Diffusion Vision Language Model by Global Probability Optimization

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Although diffusion-based vision-language models (dVLMs) exhibit robustness against conventional jailbreaking attacks, their progressive rejection mechanisms introduce new security vulnerabilities. This work proposes Global Probability Optimization (GPO), a novel jailbreaking paradigm that manipulates the global generative dynamics during the denoising process of diffusion models without requiring prefix optimization. Building upon this approach, we develop GPO-V, the first jailbreaking framework tailored to the visual modality. By leveraging masked diffusion models and joint vision-language perturbation generation, GPO-V achieves high stealthiness and demonstrates strong cross-model transferability. Our method successfully compromises multiple state-of-the-art dVLMs, revealing for the first time a critical security flaw inherent in non-autoregressive generative architectures.

📝 Abstract

Diffusion Vision-Language Models (dVLMs), built upon the non-causal foundations of Diffusion Large Language Models (dLLMs), have demonstrated remarkable efficacy in multimodal tasks by departing from the traditional autoregressive generation paradigm. While dVLMs appear inherently robust against conventional jailbreak tactics, which we categorize as Fixed Prefix Optimization (FPO) (e.g., anchoring responses with "Sure, here is"), this perceived resilience is deceptive. Our investigation into the safety landscape of dVLMs reveals a unique refusal pattern: Immediate Refusal and Progressive Refusal. We find that while FPO-based attacks often fail by triggering the latter, the progressive refinement process itself uncovers a novel, latent attack surface. To exploit this vulnerability, we propose Global Probability Optimization (GPO), a general jailbreak paradigm designed specifically for the denoising trajectory of masked diffusion models. Unlike prefix-based methods, GPO manipulates the global generative dynamics to bypass guardrails in diffusion language models. Building on this, we introduce GPO-V, the first visual-modality jailbreak framework tailored for dVLMs. Empirical results demonstrate that GPO-V produces stealthy perturbations with exceptional cross-model transferability, revealing a critical security gap in non-sequential generative architectures. Our findings underscore the critical urgency of addressing safety alignment in dVLMs. These results necessitate an immediate and fundamental re-evaluation of current defense paradigms to mitigate the unique risks of diffusion-based generation. Our code is available at: https://anonymous.4open.science/r/GPO-V-0250.

Problem

Research questions and friction points this paper is trying to address.

Diffusion Vision-Language Models

jailbreak

safety alignment

non-autoregressive generation

adversarial attacks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Global Probability Optimization

Diffusion Vision-Language Models

Jailbreak Attack