MSTS: A Multimodal Safety Test Suite for Vision-Language Models

📅 2025-01-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically uncovers a latent safety risk in vision-language models (VLMs): hazardous behavior triggered by joint image-text inputs. Method: We introduce MSTS—the first multimodal safety evaluation suite for VLMs—comprising 400 cross-modal trigger samples spanning 40 fine-grained harm categories. We propose a novel definition and evaluation paradigm for multimodal cooperative triggering, design a cross-lingual safety assessment framework, and conduct multimodal adversarial prompting, semantic vulnerability mining, and automated classifier evaluation. Contribution/Results: We discover that non-English prompts increase harmful response rates by 3.2× on average; identify and validate “accidental safety”—a phenomenon where VLMs exhibit false compliance due to comprehension deficits; demonstrate that unimodal text-only safety testing severely underestimates real-world multimodal risks; and show that even the state-of-the-art safety classifier achieves only F1 = 0.61, underscoring the critical need for rigorous multimodal safety evaluation.

Technology Category

Application Category

📝 Abstract
Vision-language models (VLMs), which process image and text inputs, are increasingly integrated into chat assistants and other consumer AI applications. Without proper safeguards, however, VLMs may give harmful advice (e.g. how to self-harm) or encourage unsafe behaviours (e.g. to consume drugs). Despite these clear hazards, little work so far has evaluated VLM safety and the novel risks created by multimodal inputs. To address this gap, we introduce MSTS, a Multimodal Safety Test Suite for VLMs. MSTS comprises 400 test prompts across 40 fine-grained hazard categories. Each test prompt consists of a text and an image that only in combination reveal their full unsafe meaning. With MSTS, we find clear safety issues in several open VLMs. We also find some VLMs to be safe by accident, meaning that they are safe because they fail to understand even simple test prompts. We translate MSTS into ten languages, showing non-English prompts to increase the rate of unsafe model responses. We also show models to be safer when tested with text only rather than multimodal prompts. Finally, we explore the automation of VLM safety assessments, finding even the best safety classifiers to be lacking.
Problem

Research questions and friction points this paper is trying to address.

Visual Language Models
Safety Risks
Harmful Behavior
Innovation

Methods, ideas, or system contributions that make the work stand out.

MSTS
Visual Language Models
Multilingual Safety Assessment
Paul Röttger
Paul Röttger
Postdoctoral Researcher, Bocconi University
Large Language ModelsSafety and Societal Impacts of AI Systems
Giuseppe Attanasio
Giuseppe Attanasio
Postdoctoral Researcher, Instituto de Telecomunicações
AIFairnessTransparencySafety
Felix Friedrich
Felix Friedrich
postdoc @ Meta FAIR, Montreal
Multimodal AIGenerative AIAI AlignmentAI Safety
J
Janis Goldzycher
University of Zurich
Alicia Parrish
Alicia Parrish
Google DeepMind
cognitive sciencecrowdsourcingdata-centric AIresponsible AI
Rishabh Bhardwaj
Rishabh Bhardwaj
Singapore University of Technology and Design
Natural Language ProcessingMachine Learning
C
C. D. Bonaventura
King’s College London, Imperial College London
R
Roman Eng
Clarkson University
G
Gaia El Khoury Geagea
Bocconi University
S
Sujata Goswami
Lawrence Berkeley National Laboratory
Jieun Han
Jieun Han
KAIST
NLPHCI
Dirk Hovy
Dirk Hovy
Bocconi University
Natural Language ProcessingMachine LearningComputational SociolinguisticsComputational Social ScienceEthics in NLP
Seogyeong Jeong
Seogyeong Jeong
KAIST
NLPLLM
P
Paloma Jeretivc
University of Pennsylvania
F
F. Plaza-del-Arco
Bocconi University
Donya Rooein
Donya Rooein
Postdoc at Bocconi University
Natural Language ProcessingMachine LearningConversational AIAI for Education
P
P. Schramowski
TU Darmstadt, Hessian.AI, DFKI, CERTAIN
A
Anastassia Shaitarova
University of Zurich
X
Xudong Shen
National University of Singapore
R
Richard Willats
Contextual AI
Andrea Zugarini
Andrea Zugarini
Expert.ai, University of Siena
Artificial IntelligenceMachine LearningNatural Language Processing
Bertie Vidgen
Bertie Vidgen
Oxford, Mercor
EvalsMCP + RAGAlignment + SafetyContent Moderation