🤖 AI Summary
To address the challenge of image moderation under diverse regulatory requirements—such as cultural sensitivity and child protection—this paper proposes a rule-driven, interpretable moderation framework. Methodologically, it introduces a novel rule decomposition and multi-stage prompt-driven annotation augmentation paradigm, enabling the construction of ICM-Instruct—the first fine-grained, rule-aligned instruction dataset featuring explanatory rationales and QA pairs. Leveraging multimodal large language models (MLLMs), the framework integrates explicit image annotations, structured rule parsing, and multi-turn reasoning prompts via instruction tuning. Experiments across multiple test sources demonstrate average improvements of 36.8% in classification accuracy and 26.6% in explanation quality over state-of-the-art MLLM-based approaches, with plug-and-play industrial deployability. Key contributions include: (1) the first high-quality, rule–image-aligned instruction dataset; (2) an interpretable rule-execution mechanism; and (3) a lightweight adaptation paradigm for cross-cultural compliance.
📝 Abstract
Controversial contents largely inundate the Internet, infringing various cultural norms and child protection standards. Traditional Image Content Moderation (ICM) models fall short in producing precise moderation decisions for diverse standards, while recent multimodal large language models (MLLMs), when adopted to general rule-based ICM, often produce classification and explanation results that are inconsistent with human moderators. Aiming at flexible, explainable, and accurate ICM, we design a novel rule-based dataset generation pipeline, decomposing concise human-defined rules and leveraging well-designed multi-stage prompts to enrich short explicit image annotations. Our ICM-Instruct dataset includes detailed moderation explanation and moderation Q-A pairs. Built upon it, we create our ICM-Assistant model in the framework of rule-based ICM, making it readily applicable in real practice. Our ICM-Assistant model demonstrates exceptional performance and flexibility. Specifically, it significantly outperforms existing approaches on various sources, improving both the moderation classification (36.8% on average) and moderation explanation quality (26.6% on average) consistently over existing MLLMs. Code/Data is available at https://github.com/zhaoyuzhi/ICM-Assistant.