ShieldVLM: Safeguarding the Multimodal Implicit Toxicity via Deliberative Reasoning with LVLMs

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Addressing the challenge of detecting multimodal implicit toxicity—harmful semantic combinations arising only when benign images and text are jointly interpreted—this work introduces the first systematic, fine-grained taxonomy (7 categories, 31 subcategories) and the inaugural dedicated benchmark dataset, MMIT, comprising 2,100 samples spanning five cross-modal association patterns. We propose ShieldVLM, a novel vision-language model integrating cross-modal alignment, hierarchical cautious reasoning, and risk溯源 (causal attribution) to support implicit toxicity detection at the sentence, prompt, and dialogue levels. Extensive experiments demonstrate that ShieldVLM significantly outperforms state-of-the-art baselines on both implicit and explicit toxicity detection. To foster reproducibility and community advancement, we fully open-source the code, model checkpoints, and MMIT dataset—establishing foundational resources for multimodal content safety research.

Technology Category

Application Category

📝 Abstract

Toxicity detection in multimodal text-image content faces growing challenges, especially with multimodal implicit toxicity, where each modality appears benign on its own but conveys hazard when combined. Multimodal implicit toxicity appears not only as formal statements in social platforms but also prompts that can lead to toxic dialogs from Large Vision-Language Models (LVLMs). Despite the success in unimodal text or image moderation, toxicity detection for multimodal content, particularly the multimodal implicit toxicity, remains underexplored. To fill this gap, we comprehensively build a taxonomy for multimodal implicit toxicity (MMIT) and introduce an MMIT-dataset, comprising 2,100 multimodal statements and prompts across 7 risk categories (31 sub-categories) and 5 typical cross-modal correlation modes. To advance the detection of multimodal implicit toxicity, we build ShieldVLM, a model which identifies implicit toxicity in multimodal statements, prompts and dialogs via deliberative cross-modal reasoning. Experiments show that ShieldVLM outperforms existing strong baselines in detecting both implicit and explicit toxicity. The model and dataset will be publicly available to support future researches. Warning: This paper contains potentially sensitive contents.

Problem

Research questions and friction points this paper is trying to address.

Detecting implicit toxicity in multimodal text-image content

Addressing underexplored toxicity in combined benign modalities

Improving detection of toxic prompts and dialogs in LVLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Comprehensive taxonomy for multimodal implicit toxicity

Introduces MMIT-dataset with 2,100 multimodal samples

ShieldVLM detects toxicity via cross-modal reasoning

🔎 Similar Papers

No similar papers found.