BLM-Guard: Explainable Multimodal Ad Moderation with Chain-of-Thought and Policy-Aligned Rewards

📅 2026-02-20

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This study addresses the deceptive practices prevalent in multimodal advertisements on short-video platforms, where manipulation often arises from coordinated misuse of visual, audio, and textual modalities. To combat this, the authors propose a policy-driven, rule-guided multitask auditing framework that integrates chain-of-thought reasoning, multimodal alignment, and reinforcement learning to detect both intra-modal manipulation and cross-modal inconsistencies. A novel rule-based In-Context Chain-of-Thought (ICoT) data synthesis pipeline is introduced to drastically reduce annotation costs. The framework further employs a composite reward mechanism that jointly optimizes causal coherence and regulatory compliance. Evaluated on real-world advertising data, the model significantly outperforms strong baselines in accuracy, consistency, and generalization, while maintaining high interpretability and robustness.

Technology Category

Application Category

📝 Abstract

Short-video platforms now host vast multimodal ads whose deceptive visuals, speech and subtitles demand finer-grained, policy-driven moderation than community safety filters. We present BLM-Guard, a content-audit framework for commercial ads that fuses Chain-of-Thought reasoning with rule-based policy principles and a critic-guided reward. A rule-driven ICoT data-synthesis pipeline jump-starts training by generating structured scene descriptions, reasoning chains and labels, cutting annotation costs. Reinforcement learning then refines the model using a composite reward balancing causal coherence with policy adherence. A multitask architecture models intra-modal manipulations (e.g., exaggerated imagery) and cross-modal mismatches (e.g., subtitle-speech drift), boosting robustness. Experiments on real short-video ads show BLM-Guard surpasses strong baselines in accuracy, consistency and generalization.

Problem

Research questions and friction points this paper is trying to address.

multimodal ad moderation

deceptive content

policy-driven moderation

short-video ads

cross-modal mismatch

Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-Thought Reasoning

Policy-Aligned Rewards

Multimodal Ad Moderation