OmniSafeBench-MM: A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack-Defense Evaluation

📅 2025-12-06

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

Existing multimodal large language model (MLLM) safety benchmarks suffer from narrow attack scenarios, lack of standardized defense evaluation, and non-reproducible tooling. To address these limitations, this paper introduces the first unified multimodal jailbreaking benchmark for comprehensive attack-defense evaluation. It integrates 13 attack methods, 15 defense strategies, and a high-quality dataset spanning nine critical risk domains. We propose a three-dimensional safety assessment framework—measuring harmfulness, intent consistency, and response comprehensiveness—to jointly quantify safety and utility. Furthermore, we design novel multimodal safety data construction techniques and a modular attack-defense integration framework, and release an open-source, reproducible evaluation platform supporting systematic comparison of both open- and closed-source MLLMs. Extensive experiments across 10 open-source and 8 closed-source models reveal widespread vulnerabilities, significantly advancing standardization and reproducibility in multimodal safety evaluation.

Technology Category

Application Category

📝 Abstract

Recent advances in multi-modal large language models (MLLMs) have enabled unified perception-reasoning capabilities, yet these systems remain highly vulnerable to jailbreak attacks that bypass safety alignment and induce harmful behaviors. Existing benchmarks such as JailBreakV-28K, MM-SafetyBench, and HADES provide valuable insights into multi-modal vulnerabilities, but they typically focus on limited attack scenarios, lack standardized defense evaluation, and offer no unified, reproducible toolbox. To address these gaps, we introduce OmniSafeBench-MM, which is a comprehensive toolbox for multi-modal jailbreak attack-defense evaluation. OmniSafeBench-MM integrates 13 representative attack methods, 15 defense strategies, and a diverse dataset spanning 9 major risk domains and 50 fine-grained categories, structured across consultative, imperative, and declarative inquiry types to reflect realistic user intentions. Beyond data coverage, it establishes a three-dimensional evaluation protocol measuring (1) harmfulness, distinguished by a granular, multi-level scale ranging from low-impact individual harm to catastrophic societal threats, (2) intent alignment between responses and queries, and (3) response detail level, enabling nuanced safety-utility analysis. We conduct extensive experiments on 10 open-source and 8 closed-source MLLMs to reveal their vulnerability to multi-modal jailbreak. By unifying data, methodology, and evaluation into an open-source, reproducible platform, OmniSafeBench-MM provides a standardized foundation for future research. The code is released at https://github.com/jiaxiaojunQAQ/OmniSafeBench-MM.

Problem

Research questions and friction points this paper is trying to address.

Evaluates multimodal jailbreak attack-defense vulnerabilities in MLLMs

Unifies diverse attack methods, defenses, and risk datasets

Establishes a three-dimensional safety-utility evaluation protocol

Innovation

Methods, ideas, or system contributions that make the work stand out.

Comprehensive toolbox for multimodal jailbreak attack-defense evaluation

Integrates 13 attack methods, 15 defenses, and diverse risk datasets

Establishes three-dimensional evaluation protocol for safety-utility analysis

🔎 Similar Papers

No similar papers found.