LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models

📅 2024-06-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address security and compliance risks posed by image/video data and generative models, this paper introduces the first open-source, vision-language model (VLM)-driven safety framework. Methodologically, it proposes a multi-task joint modeling architecture integrating expert-annotated fine-tuning, semantics-enhanced data augmentation, and configurable classification and reasoning modules. Key contributions include: (1) releasing a high-quality multimodal safety dataset featuring fine-grained categories, expert-provided risk scores, and attribution rationales; (2) designing a lightweight adaptation strategy enabling efficient deployment of VLMs ranging from 0.5B to 7B parameters; and (3) achieving state-of-the-art accuracy in safety evaluation—already deployed for automated annotation of hundreds of millions of images and real-time content moderation in text-to-image generation systems. All code, datasets, and model weights are publicly released.

Technology Category

Application Category

📝 Abstract
This paper introduces LlavaGuard, a suite of VLM-based vision safeguards that address the critical need for reliable guardrails in the era of large-scale data and models. To this end, we establish a novel open framework, describing a customizable safety taxonomy, data preprocessing, augmentation, and training setup. For teaching a VLM safeguard on safety, we further create a multimodal safety dataset with high-quality human expert annotations, where each image is labeled with a safety rating, category and rationale. We also employ advanced augmentations to support context-specific assessments. The resulting LlavaGuard models, ranging from 0.5B to 7B, serve as a versatile tool for evaluating the safety compliance of visual content against flexible policies. In comprehensive experiments, LlavaGuard outperforms both state-of-the-art safeguards and VLMs in accuracy and in flexibly handling different policies. Additionally, we demonstrate LlavaGuard's performance in two real-world applications: large-scale dataset annotation and moderation of text-to-image models. We make our entire framework publicly available, including the dataset and model weights.
Problem

Research questions and friction points this paper is trying to address.

Data Security
Image Protection
Video Security
Innovation

Methods, ideas, or system contributions that make the work stand out.

LlavaGuard
Context-aware Image Assessment
Scalable Safety Models
🔎 Similar Papers
No similar papers found.