Safe Inputs but Unsafe Output: Benchmarking Cross-modality Safety Alignment of Large Vision-Language Model

📅 2024-06-21

📈 Citations: 4

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This paper addresses the critical gap in cross-modal safety alignment of large vision-language models (LVLMs), introducing the novel safety challenge “Safe Input but Unsafe Output” (SIUO)—where individually benign unimodal inputs jointly trigger hazardous or unethical outputs. Method: We formally define and empirically validate this paradox, construct SIUO—the first cross-modal safety benchmark for LVLMs—covering nine high-risk ethical domains; and propose multimodal adversarial prompting, cross-modal risk annotation, and a 9-dimensional ethical modeling framework to enable robust safety evaluation across open- and closed-source LVLMs. Contribution/Results: Experiments reveal severe safety vulnerabilities across state-of-the-art models—including GPT-4V and LLaVA—demonstrating pervasive failures in cross-modal safety alignment. SIUO provides a rigorous, reproducible foundation for diagnosing and improving multimodal safety, highlighting urgent needs for principled cross-modal alignment mechanisms.

Technology Category

Application Category

📝 Abstract

As Artificial General Intelligence (AGI) becomes increasingly integrated into various facets of human life, ensuring the safety and ethical alignment of such systems is paramount. Previous studies primarily focus on single-modality threats, which may not suffice given the integrated and complex nature of cross-modality interactions. We introduce a novel safety alignment challenge called Safe Inputs but Unsafe Output (SIUO) to evaluate cross-modality safety alignment. Specifically, it considers cases where single modalities are safe independently but could potentially lead to unsafe or unethical outputs when combined. To empirically investigate this problem, we developed the SIUO, a cross-modality benchmark encompassing 9 critical safety domains, such as self-harm, illegal activities, and privacy violations. Our findings reveal substantial safety vulnerabilities in both closed- and open-source LVLMs, such as GPT-4V and LLaVA, underscoring the inadequacy of current models to reliably interpret and respond to complex, real-world scenarios.

Problem

Research questions and friction points this paper is trying to address.

Evaluating cross-modality safety alignment

Identifying unsafe outputs from safe inputs

Benchmarking LVLMs in critical safety domains

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-modality safety alignment benchmark

Evaluates single-modality safe inputs

Assesses unsafe outputs in combinations

🔎 Similar Papers

Cross-Modal Safety Alignment: Is textual unlearning all you need?