Automated Safety Benchmarking: A Multi-agent Pipeline for LVLMs

📅 2026-01-27

📈 Citations: 0

✨ Influential: 0

career value

247K/year

🤖 AI Summary

Existing safety evaluation benchmarks for large vision-language models (LVLMs) are time-consuming to construct, static in nature, and lack sufficient discriminative power to keep pace with rapid model evolution and emerging risks. To address these limitations, this work proposes VLSafetyBencher—the first automated system for constructing safety evaluation benchmarks tailored to LVLMs. By orchestrating four collaborative agents responsible for data preprocessing, generation, augmentation, and selection, VLSafetyBencher enables fully automated, efficient production of high-quality test samples. The approach substantially enhances the benchmark’s dynamism, scalability, and discriminative capability, yielding—in under a week and at minimal cost—a benchmark that effectively reveals up to 70% performance disparity in safety behavior across different LVLMs.

Technology Category

Application Category

📝 Abstract

Large vision-language models (LVLMs) exhibit remarkable capabilities in cross-modal tasks but face significant safety challenges, which undermine their reliability in real-world applications. Efforts have been made to build LVLM safety evaluation benchmarks to uncover their vulnerability. However, existing benchmarks are hindered by their labor-intensive construction process, static complexity, and limited discriminative power. Thus, they may fail to keep pace with rapidly evolving models and emerging risks. To address these limitations, we propose VLSafetyBencher, the first automated system for LVLM safety benchmarking. VLSafetyBencher introduces four collaborative agents: Data Preprocessing, Generation, Augmentation, and Selection agents to construct and select high-quality samples. Experiments validates that VLSafetyBencher can construct high-quality safety benchmarks within one week at a minimal cost. The generated benchmark effectively distinguish safety, with a safety rate disparity of 70% between the most and least safe models.

Problem

Research questions and friction points this paper is trying to address.

LVLM safety

benchmarking

automated evaluation

safety challenges

discriminative power

Innovation

Methods, ideas, or system contributions that make the work stand out.

automated benchmarking

multi-agent system

LVLM safety