🤖 AI Summary
Existing safety evaluation benchmarks are highly fragmented across tasks and modalities, making it difficult to systematically assess the safety risks of unified multimodal models (UMMs). To address this gap, this work proposes UniSAFE—the first system-level safety benchmark tailored for UMMs—which innovatively employs a shared-objective design to unify risk scenarios across seven input–output modality combinations. The benchmark comprises 6,802 human-reviewed test instances and explicitly supports high-risk settings such as multi-image synthesis and multi-turn interactions. Evaluations of 15 prominent UMMs using this framework reveal that violation rates in image generation tasks are significantly higher than in text-based tasks, with safety risks markedly exacerbated in multi-image and multi-turn contexts.
📝 Abstract
Unified Multimodal Models (UMMs) offer powerful cross-modality capabilities but introduce new safety risks not observed in single-task models. Despite their emergence, existing safety benchmarks remain fragmented across tasks and modalities, limiting the comprehensive evaluation of complex system-level vulnerabilities. To address this gap, we introduce UniSAFE, the first comprehensive benchmark for system-level safety evaluation of UMMs across 7 I/O modality combinations, spanning conventional tasks and novel multimodal-context image generation settings. UniSAFE is built with a shared-target design that projects common risk scenarios across task-specific I/O configurations, enabling controlled cross-task comparisons of safety failures. Comprising 6,802 curated instances, we use UniSAFE to evaluate 15 state-of-the-art UMMs, both proprietary and open-source. Our results reveal critical vulnerabilities across current UMMs, including elevated safety violations in multi-image composition and multi-turn settings, with image-output tasks consistently more vulnerable than text-output tasks. These findings highlight the need for stronger system-level safety alignment for UMMs. Our code and data are publicly available at https://github.com/segyulee/UniSAFE