KSAFE-MM: A Multimodal Safety Benchmark via Localized Contextualization for Korean Cultural Risks

📅 2026-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the prevailing English-centric bias in existing multimodal safety evaluations, which often overlook risks rooted in local cultural contexts. To bridge this gap, we introduce KSAFE-MM—the first multimodal safety benchmark tailored to Korean culture—designed through linguistic contextualization, culturally grounded visual queries, and jailbreaking-style textual prompts to jointly assess both general and culture-specific vulnerabilities. Experiments across twelve state-of-the-art multimodal large language models demonstrate that culturally contextualized attacks substantially increase attack success rates, with the ProgramExecution jailbreaking strategy achieving a 74.2% ASR. Our findings further reveal a systematic trade-off between robust safety alignment and excessive refusal behavior in current models.
📝 Abstract
Multimodal Large Language Models (MLLMs) exacerbate safety risks by introducing vulnerabilities across multiple modalities, such as language and vision. Current MLLM safety evaluation tools, however, suffer from major limitations: 1) English-centric dataset construction, and 2) a focus on generic risks that are not tied to local cultural contexts. This paper introduces KSAFE-MM, a benchmark for Korean multimodal safety evaluation that covers both general safety risks and culture-specific vulnerabilities. KSAFE-MM consists of two parts, KSAFE-MM-G and KSAFE-MM-C. KSAFE-MM-G evaluates globally shared risks in Korean contexts through linguistic contextualization, which transforms generic safety queries into contextually grounded multimodal samples. KSAFE-MM-C targets culture-dependent MLLM safety vulnerabilities using localized visual queries derived from real-world contexts. It pairs these visual queries with jailbreak-style textual queries to cover multimodal safety risks involving cultural visual cues and malicious textual intent. Together, these components provide a general-to-local construction pipeline for evaluating both globally shared safety risks and culture-specific vulnerabilities. We evaluate 12 state-of-the-art MLLMs on KSAFE-MM and reveal that models exhibit greater vulnerability to culturally grounded attacks than to generic ones. Notably, jailbreaking strategies substantially amplify attack success rates, with ProgramExecution yielding up to 74.2% ASR compared to 13.4% for standard queries. Furthermore, we identify a systematic trade-off between safety and over-refusal, where models achieving low ASR tend to exhibit excessive refusal behavior on benign queries. These findings highlight the urgent need for culturally grounded safety evaluation beyond English-centric benchmarks.
Problem

Research questions and friction points this paper is trying to address.

Multimodal Large Language Models
Safety Evaluation
Cultural Context
Korean Cultural Risks
Multimodal Safety
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal safety benchmark
cultural contextualization
localized visual queries
jailbreak attacks
MLLM evaluation