Beyond Visual Safety: Jailbreaking Multimodal Large Language Models for Harmful Image Generation via Semantic-Agnostic Inputs

📅 2026-01-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses critical vulnerabilities in multimodal large language models (MLLMs) regarding visual safety alignment, demonstrating their inability to effectively mitigate harmful image generation. The authors propose the BVS framework, which introduces a novel “reconstruction-generation” strategy that neutralizes malicious visual content through semantic decoupling—achieved by disentangling adversarial intent from the original input via neutral visual stitching and inductive recombination. Coupled with multimodal prompt injection, this approach successfully bypasses the safety mechanisms of state-of-the-art MLLMs using semantically irrelevant inputs. Experiments on GPT-5 (January 12, 2026 version) achieve a jailbreak success rate of 98.21%, exposing profound weaknesses in current visual safety defenses and establishing a new benchmark and perspective for research in multimodal alignment security.

Technology Category

Application Category

📝 Abstract
The rapid advancement of Multimodal Large Language Models (MLLMs) has introduced complex security challenges, particularly at the intersection of textual and visual safety. While existing schemes have explored the security vulnerabilities of MLLMs, the investigation into their visual safety boundaries remains insufficient. In this paper, we propose Beyond Visual Safety (BVS), a novel image-text pair jailbreaking framework specifically designed to probe the visual safety boundaries of MLLMs. BVS employs a"reconstruction-then-generation"strategy, leveraging neutralized visual splicing and inductive recomposition to decouple malicious intent from raw inputs, thereby leading MLLMs to be induced into generating harmful images. Experimental results demonstrate that BVS achieves a remarkable jailbreak success rate of 98.21\% against GPT-5 (12 January 2026 release). Our findings expose critical vulnerabilities in the visual safety alignment of current MLLMs.
Problem

Research questions and friction points this paper is trying to address.

Multimodal Large Language Models
visual safety
harmful image generation
jailbreaking
security vulnerabilities
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal large language models
visual safety
jailbreaking
semantic-agnostic inputs
harmful image generation
🔎 Similar Papers
No similar papers found.
M
Mingyu Yu
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China; School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing, China
L
Lana Liu
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China; School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing, China
Zhehao Zhao
Zhehao Zhao
Peking University
Software EngineeringProgram LanguagesFormal MethodsSystem Software
Wei Wang
Wei Wang
Meituan, Alibaba Group, BUPT
Natural Language ProcessingDeep LearningFoundation ModelReasoning
S
Sujuan Qin
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China; School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing, China