ELITE: Enhanced Language-Image Toxicity Evaluation for Safety

📅 2025-02-07

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

Existing VLM safety benchmarks suffer from weak harmfulness, ambiguous annotations, and limited image-text pair diversity, while automated evaluation struggles to detect latent harms. To address these limitations, we propose ELITE—the first high-quality multimodal benchmark and accompanying evaluator specifically designed for VLM safety assessment. Methodologically, ELITE introduces an explicit toxicity modeling paradigm that integrates multimodal toxicity scoring, image-text alignment filtering, and adversarial image-text pairing generation, calibrated via human-evaluator consistency to ensure strong alignment between automatic and human judgments. Experiments demonstrate that the ELITE evaluator achieves significantly higher correlation with human annotations than prior methods. Its benchmark covers a broader spectrum of harm categories and realistic scenarios, substantially enhancing harmfulness intensity, semantic clarity, and image-text diversity—thereby establishing a new standard for VLM safety evaluation.

Technology Category

Application Category

📝 Abstract

Current Vision Language Models (VLMs) remain vulnerable to malicious prompts that induce harmful outputs. Existing safety benchmarks for VLMs primarily rely on automated evaluation methods, but these methods struggle to detect implicit harmful content or produce inaccurate evaluations. Therefore, we found that existing benchmarks have low levels of harmfulness, ambiguous data, and limited diversity in image-text pair combinations. To address these issues, we propose the ELITE {em benchmark}, a high-quality safety evaluation benchmark for VLMs, underpinned by our enhanced evaluation method, the ELITE {em evaluator}. The ELITE evaluator explicitly incorporates a toxicity score to accurately assess harmfulness in multimodal contexts, where VLMs often provide specific, convincing, but unharmful descriptions of images. We filter out ambiguous and low-quality image-text pairs from existing benchmarks using the ELITE evaluator and generate diverse combinations of safe and unsafe image-text pairs. Our experiments demonstrate that the ELITE evaluator achieves superior alignment with human evaluations compared to prior automated methods, and the ELITE benchmark offers enhanced benchmark quality and diversity. By introducing ELITE, we pave the way for safer, more robust VLMs, contributing essential tools for evaluating and mitigating safety risks in real-world applications.

Problem

Research questions and friction points this paper is trying to address.

Detect implicit harmful content in VLMs

Improve evaluation accuracy of VLM safety

Enhance diversity in image-text pair benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhanced toxicity score

Filtered image-text pairs

Diverse safe-unsafe combinations

🔎 Similar Papers

No similar papers found.