UniSAFE: A Comprehensive Benchmark for Safety Evaluation of Unified Multimodal Models

📅 2026-03-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing safety evaluation benchmarks are highly fragmented across tasks and modalities, making it difficult to systematically assess the safety risks of unified multimodal models (UMMs). To address this gap, this work proposes UniSAFE—the first system-level safety benchmark tailored for UMMs—which innovatively employs a shared-objective design to unify risk scenarios across seven input–output modality combinations. The benchmark comprises 6,802 human-reviewed test instances and explicitly supports high-risk settings such as multi-image synthesis and multi-turn interactions. Evaluations of 15 prominent UMMs using this framework reveal that violation rates in image generation tasks are significantly higher than in text-based tasks, with safety risks markedly exacerbated in multi-image and multi-turn contexts.

Technology Category

Application Category

📝 Abstract
Unified Multimodal Models (UMMs) offer powerful cross-modality capabilities but introduce new safety risks not observed in single-task models. Despite their emergence, existing safety benchmarks remain fragmented across tasks and modalities, limiting the comprehensive evaluation of complex system-level vulnerabilities. To address this gap, we introduce UniSAFE, the first comprehensive benchmark for system-level safety evaluation of UMMs across 7 I/O modality combinations, spanning conventional tasks and novel multimodal-context image generation settings. UniSAFE is built with a shared-target design that projects common risk scenarios across task-specific I/O configurations, enabling controlled cross-task comparisons of safety failures. Comprising 6,802 curated instances, we use UniSAFE to evaluate 15 state-of-the-art UMMs, both proprietary and open-source. Our results reveal critical vulnerabilities across current UMMs, including elevated safety violations in multi-image composition and multi-turn settings, with image-output tasks consistently more vulnerable than text-output tasks. These findings highlight the need for stronger system-level safety alignment for UMMs. Our code and data are publicly available at https://github.com/segyulee/UniSAFE
Problem

Research questions and friction points this paper is trying to address.

Unified Multimodal Models
Safety Evaluation
Multimodal Safety
System-level Vulnerabilities
Cross-modality Risks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified Multimodal Models
Safety Benchmark
System-level Evaluation
Cross-modality Safety
Multimodal Alignment
🔎 Similar Papers
No similar papers found.
S
Segyu Lee
KAIST AI
B
Boryeong Cho
KAIST AI
Hojung Jung
Hojung Jung
KATECH(Korea Automotive Technology Institute)
Automated Driving SystemMachine LearningComputer VisionRoboticsLiDAR
S
Seokhyun An
Department of Computer Science and Engineering, UNIST
J
Juhyeong Kim
Department of Mathematical Sciences, KAIST
Jaehyun Kwak
Jaehyun Kwak
Ph.D. student @ KAIST AI
multi-modal learningfederated learning
Yongjin Yang
Yongjin Yang
University of Toronto
RLLLM AgentGenerative ModelsAlignment
S
Sangwon Jang
KAIST AI
Youngrok Park
Youngrok Park
KAIST AI
Generative ModellingStatistical Decision MakingOptimization
W
Wonjun Chang
KAIST CS
S
Se-Young Yun
KAIST AI