BLM-Guard: Explainable Multimodal Ad Moderation with Chain-of-Thought and Policy-Aligned Rewards

📅 2026-02-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the deceptive practices prevalent in multimodal advertisements on short-video platforms, where manipulation often arises from coordinated misuse of visual, audio, and textual modalities. To combat this, the authors propose a policy-driven, rule-guided multitask auditing framework that integrates chain-of-thought reasoning, multimodal alignment, and reinforcement learning to detect both intra-modal manipulation and cross-modal inconsistencies. A novel rule-based In-Context Chain-of-Thought (ICoT) data synthesis pipeline is introduced to drastically reduce annotation costs. The framework further employs a composite reward mechanism that jointly optimizes causal coherence and regulatory compliance. Evaluated on real-world advertising data, the model significantly outperforms strong baselines in accuracy, consistency, and generalization, while maintaining high interpretability and robustness.

Technology Category

Application Category

📝 Abstract
Short-video platforms now host vast multimodal ads whose deceptive visuals, speech and subtitles demand finer-grained, policy-driven moderation than community safety filters. We present BLM-Guard, a content-audit framework for commercial ads that fuses Chain-of-Thought reasoning with rule-based policy principles and a critic-guided reward. A rule-driven ICoT data-synthesis pipeline jump-starts training by generating structured scene descriptions, reasoning chains and labels, cutting annotation costs. Reinforcement learning then refines the model using a composite reward balancing causal coherence with policy adherence. A multitask architecture models intra-modal manipulations (e.g., exaggerated imagery) and cross-modal mismatches (e.g., subtitle-speech drift), boosting robustness. Experiments on real short-video ads show BLM-Guard surpasses strong baselines in accuracy, consistency and generalization.
Problem

Research questions and friction points this paper is trying to address.

multimodal ad moderation
deceptive content
policy-driven moderation
short-video ads
cross-modal mismatch
Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-Thought Reasoning
Policy-Aligned Rewards
Multimodal Ad Moderation
Rule-Driven Data Synthesis
Reinforcement Learning
🔎 Similar Papers
No similar papers found.
Yiran Yang
Yiran Yang
University of Chinese Academy of Sciences
Object detection、 AIGC、Knowledge Distillation
Z
Zhaowei Liu
Kuaishou Technology
Y
Yuan Yuan
Kuaishou Technology
Y
Yukun Song
Kuaishou Technology, Beijing University of Posts and Telecommunications
X
Xiong Ma
Kuaishou Technology
Y
Yinghao Song
Kuaishou Technology
X
Xiangji Zeng
Kuaishou Technology
Lu Sun
Lu Sun
University of Massachusetts, Amherst
optoelectronicsmicroelectronicsprotein nanowire
Yulu Wang
Yulu Wang
University of Maryland
Information Retrieval
Hai Zhou
Hai Zhou
Northwestern University
EDAlogic lockinghardware securityphysical designVLSI
S
Shuai Cui
Kuaishou Technology, Shandong University
Z
Zhaohan Gong
Kuaishou Technology
J
Jiefei Zhang
Kuaishou Technology