Safe Inputs but Unsafe Output: Benchmarking Cross-modality Safety Alignment of Large Vision-Language Model

📅 2024-06-21
📈 Citations: 4
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the critical gap in cross-modal safety alignment of large vision-language models (LVLMs), introducing the novel safety challenge “Safe Input but Unsafe Output” (SIUO)—where individually benign unimodal inputs jointly trigger hazardous or unethical outputs. Method: We formally define and empirically validate this paradox, construct SIUO—the first cross-modal safety benchmark for LVLMs—covering nine high-risk ethical domains; and propose multimodal adversarial prompting, cross-modal risk annotation, and a 9-dimensional ethical modeling framework to enable robust safety evaluation across open- and closed-source LVLMs. Contribution/Results: Experiments reveal severe safety vulnerabilities across state-of-the-art models—including GPT-4V and LLaVA—demonstrating pervasive failures in cross-modal safety alignment. SIUO provides a rigorous, reproducible foundation for diagnosing and improving multimodal safety, highlighting urgent needs for principled cross-modal alignment mechanisms.

Technology Category

Application Category

📝 Abstract
As Artificial General Intelligence (AGI) becomes increasingly integrated into various facets of human life, ensuring the safety and ethical alignment of such systems is paramount. Previous studies primarily focus on single-modality threats, which may not suffice given the integrated and complex nature of cross-modality interactions. We introduce a novel safety alignment challenge called Safe Inputs but Unsafe Output (SIUO) to evaluate cross-modality safety alignment. Specifically, it considers cases where single modalities are safe independently but could potentially lead to unsafe or unethical outputs when combined. To empirically investigate this problem, we developed the SIUO, a cross-modality benchmark encompassing 9 critical safety domains, such as self-harm, illegal activities, and privacy violations. Our findings reveal substantial safety vulnerabilities in both closed- and open-source LVLMs, such as GPT-4V and LLaVA, underscoring the inadequacy of current models to reliably interpret and respond to complex, real-world scenarios.
Problem

Research questions and friction points this paper is trying to address.

Evaluating cross-modality safety alignment
Identifying unsafe outputs from safe inputs
Benchmarking LVLMs in critical safety domains
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-modality safety alignment benchmark
Evaluates single-modality safe inputs
Assesses unsafe outputs in combinations
S
Siyin Wang
School of Computer Science, Fudan University
X
Xingsong Ye
School of Computer Science, Fudan University
Q
Qinyuan Cheng
School of Computer Science, Fudan University, Shanghai AI Laboratory
Junwen Duan
Junwen Duan
Central South University
Artificial IntelligenceNatural Language ProcessingSocial Computing
Shimin Li
Shimin Li
Fudan University
Large Language ModelSpeech Language Model
Jinlan Fu
Jinlan Fu
National University of Singapore
Natural Language ProcessingVision and LanguageLarge Language Model
X
Xipeng Qiu
School of Computer Science, Fudan University
X
Xuanjing Huang
School of Computer Science, Fudan University