RAVEN: Robust Advertisement Video Violation Temporal Grounding via Reinforcement Reasoning

📅 2025-10-18

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Advertising video violation detection faces challenges including inaccurate temporal localization, interference from noisy annotations, and poor generalization. This paper proposes an end-to-end approach integrating multimodal large language models with reinforcement learning. We introduce a progressive curriculum reinforcement learning framework and Groupwise Relative Policy Optimization (GRPO), which elicits the model’s intrinsic reasoning capability without requiring explicit reasoning annotations. A hierarchical reward mechanism is designed to jointly optimize violation classification and precise temporal localization while mitigating catastrophic forgetting. Evaluated on both industrial datasets and public benchmarks, our method achieves significant improvements in classification accuracy and temporal IoU. Online A/B testing confirms simultaneous gains in review precision and recall, and the system has been successfully deployed in production.

Technology Category

Application Category

📝 Abstract

Advertisement (Ad) video violation detection is critical for ensuring platform compliance, but existing methods struggle with precise temporal grounding, noisy annotations, and limited generalization. We propose RAVEN, a novel framework that integrates curriculum reinforcement learning with multimodal large language models (MLLMs) to enhance reasoning and cognitive capabilities for violation detection. RAVEN employs a progressive training strategy, combining precisely and coarsely annotated data, and leverages Group Relative Policy Optimization (GRPO) to develop emergent reasoning abilities without explicit reasoning annotations. Multiple hierarchical sophisticated reward mechanism ensures precise temporal grounding and consistent category prediction. Experiments on industrial datasets and public benchmarks show that RAVEN achieves superior performances in violation category accuracy and temporal interval localization. We also design a pipeline to deploy the RAVEN on the online Ad services, and online A/B testing further validates its practical applicability, with significant improvements in precision and recall. RAVEN also demonstrates strong generalization, mitigating the catastrophic forgetting issue associated with supervised fine-tuning.

Problem

Research questions and friction points this paper is trying to address.

Improving temporal grounding precision for ad violations

Addressing noisy annotations and limited generalization issues

Enhancing reasoning capabilities without explicit supervision

Innovation

Methods, ideas, or system contributions that make the work stand out.

Curriculum reinforcement learning with multimodal LLMs

Progressive training using mixed annotation data

Hierarchical reward mechanism for temporal grounding

🔎 Similar Papers

Detecting AI-Generated Video via Frame Consistency