PiCo: Jailbreaking Multimodal Large Language Models via $ extbf{Pi}$ctorial $ extbf{Co}$de Contextualization

📅 2025-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work identifies critical security vulnerabilities introduced via the visual modality in multimodal large language models (MLLMs). To exploit these, we propose PiCo—a progressive jailbreaking framework that pioneers a novel image-text-code contextualization attack paradigm. PiCo integrates three synergistic techniques: token-level glyph perturbations, programming-contextual instruction injection, and vision-language co-adversarial prompt engineering—operating hierarchically to evade both input sanitization and runtime monitoring. We further introduce a multidimensional evaluation metric balancing harmfulness and utility, enabling more precise assessment of defensive robustness. Empirical evaluation demonstrates PiCo achieves attack success rates of 84.13% on Gemini-Pro Vision and 52.66% on GPT-4, substantially outperforming prior methods. Our systematic analysis exposes fundamental weaknesses across multiple layers of current MLLM defense mechanisms, providing both theoretical insights and empirical evidence to guide the development of robust, secure multimodal AI systems.

Technology Category

Application Category

📝 Abstract
Multimodal Large Language Models (MLLMs), which integrate vision and other modalities into Large Language Models (LLMs), significantly enhance AI capabilities but also introduce new security vulnerabilities. By exploiting the vulnerabilities of the visual modality and the long-tail distribution characteristic of code training data, we present PiCo, a novel jailbreaking framework designed to progressively bypass multi-tiered defense mechanisms in advanced MLLMs. PiCo employs a tier-by-tier jailbreak strategy, using token-level typographic attacks to evade input filtering and embedding harmful intent within programming context instructions to bypass runtime monitoring. To comprehensively assess the impact of attacks, a new evaluation metric is further proposed to assess both the toxicity and helpfulness of model outputs post-attack. By embedding harmful intent within code-style visual instructions, PiCo achieves an average Attack Success Rate (ASR) of 84.13% on Gemini-Pro Vision and 52.66% on GPT-4, surpassing previous methods. Experimental results highlight the critical gaps in current defenses, underscoring the need for more robust strategies to secure advanced MLLMs.
Problem

Research questions and friction points this paper is trying to address.

Exploiting visual and code vulnerabilities in MLLMs
Bypassing multi-tiered defense mechanisms in MLLMs
Assessing attack impact via toxicity and helpfulness metrics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Token-level typographic attacks evade input filtering
Embed harmful intent in programming context instructions
New metric assesses toxicity and helpfulness post-attack
🔎 Similar Papers
No similar papers found.
A
Aofan Liu
School of Artificial Intelligence, Wuhan University; School of Computer Science, Peking University
Lulu Tang
Lulu Tang
Beijing Academy of Artificial Intelligence
3D Computer VisionVision-Language Models
Ting Pan
Ting Pan
Institute of Computing Technology, Chinese Academy of Sciences
Computer VisionMachine Learning & SystemsFoundation Models
Y
Yuguo Yin
School of Computer Science, Peking University
B
Bin Wang
School of Computer Science, Peking University
A
Ao Yang
School of Computer Science, Peking University