Lingua-SafetyBench: A Benchmark for Safety Evaluation of Multilingual Vision-Language Models

📅 2026-01-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing evaluation benchmarks struggle to comprehensively assess safety risks of vision-language models in multilingual and multimodal settings and lack semantically aligned harmful image-text pairs. This work proposes the first safety evaluation framework that disentangles language and modality dimensions, introducing a benchmark dataset comprising 100,440 semantically aligned harmful image-text pairs across 10 languages. The framework distinguishes between image-dominant and text-dominant risk types and employs red-teaming attacks alongside human and automated evaluations to systematically test 11 open-source vision-language models. The study reveals that high-resource languages are more susceptible to image-dominant attacks, whereas low-resource languages exhibit greater vulnerability under text-dominant risks. Although model scaling reduces overall attack success rates, it exacerbates disparities in safety performance across languages.

Technology Category

Application Category

📝 Abstract
Robust safety of vision-language large models (VLLMs) under joint multilingual and multimodal inputs remains underexplored. Existing benchmarks are typically multilingual but text-only, or multimodal but monolingual. Recent multilingual multimodal red-teaming efforts render harmful prompts into images, yet rely heavily on typography-style visuals and lack semantically grounded image-text pairs, limiting coverage of realistic cross-modal interactions. We introduce Lingua-SafetyBench, a benchmark of 100,440 harmful image-text pairs across 10 languages, explicitly partitioned into image-dominant and text-dominant subsets to disentangle risk sources. Evaluating 11 open-source VLLMs reveals a consistent asymmetry: image-dominant risks yield higher ASR in high-resource languages, while text-dominant risks are more severe in non-high-resource languages. A controlled study on the Qwen series shows that scaling and version upgrades reduce Attack Success Rate (ASR) overall but disproportionately benefit HRLs, widening the gap between HRLs and Non-HRLs under text-dominant risks. This underscores the necessity of language- and modality-aware safety alignment beyond mere scaling.To facilitate reproducibility and future research, we will publicly release our benchmark, model checkpoints, and source code.The code and dataset will be available at https://github.com/zsxr15/Lingua-SafetyBench.Warning: this paper contains examples with unsafe content.
Problem

Research questions and friction points this paper is trying to address.

multilingual
vision-language models
safety evaluation
multimodal
red-teaming
Innovation

Methods, ideas, or system contributions that make the work stand out.

multilingual vision-language models
safety benchmark
image-text alignment
attack success rate
modality-aware safety
🔎 Similar Papers
No similar papers found.