🤖 AI Summary
This paper addresses the critical gap in cross-modal safety alignment of large vision-language models (LVLMs), introducing the novel safety challenge “Safe Input but Unsafe Output” (SIUO)—where individually benign unimodal inputs jointly trigger hazardous or unethical outputs. Method: We formally define and empirically validate this paradox, construct SIUO—the first cross-modal safety benchmark for LVLMs—covering nine high-risk ethical domains; and propose multimodal adversarial prompting, cross-modal risk annotation, and a 9-dimensional ethical modeling framework to enable robust safety evaluation across open- and closed-source LVLMs. Contribution/Results: Experiments reveal severe safety vulnerabilities across state-of-the-art models—including GPT-4V and LLaVA—demonstrating pervasive failures in cross-modal safety alignment. SIUO provides a rigorous, reproducible foundation for diagnosing and improving multimodal safety, highlighting urgent needs for principled cross-modal alignment mechanisms.
📝 Abstract
As Artificial General Intelligence (AGI) becomes increasingly integrated into various facets of human life, ensuring the safety and ethical alignment of such systems is paramount. Previous studies primarily focus on single-modality threats, which may not suffice given the integrated and complex nature of cross-modality interactions. We introduce a novel safety alignment challenge called Safe Inputs but Unsafe Output (SIUO) to evaluate cross-modality safety alignment. Specifically, it considers cases where single modalities are safe independently but could potentially lead to unsafe or unethical outputs when combined. To empirically investigate this problem, we developed the SIUO, a cross-modality benchmark encompassing 9 critical safety domains, such as self-harm, illegal activities, and privacy violations. Our findings reveal substantial safety vulnerabilities in both closed- and open-source LVLMs, such as GPT-4V and LLaVA, underscoring the inadequacy of current models to reliably interpret and respond to complex, real-world scenarios.