🤖 AI Summary
Video ad moderation faces core challenges including difficulty in fine-grained violation localization, weak interpretability, and insufficient generalization. To address these, we propose an active reinforcement reasoning framework for fine-grained violation detection. Our approach innovatively integrates active reinforcement learning, a hierarchical reward mechanism, reasoning distillation, and a curriculum-driven multi-stage training strategy—collectively enhancing the model’s precision in localizing violations within complex ad semantics and its capacity for interpretable, stepwise reasoning. Extensive evaluations on multiple public and proprietary benchmarks demonstrate that our method consistently outperforms general-purpose large language models (LLMs) and state-of-the-art specialized models (e.g., RAVEN). Both offline ablation studies and online A/B tests confirm its superior fine-grained comprehension, robust generalization across diverse ad domains, and tangible business impact in production deployment.
📝 Abstract
Advertising (Ad) is a cornerstone of the digital economy, yet the moderation of video advertisements remains a significant challenge due to their complexity and the need for precise violation localization. While recent advancements, such as the RAVEN model, have improved coarse-grained violation detection, critical gaps persist in fine-grained understanding, explainability, and generalization. To address these limitations, we propose RAVEN++, a novel framework that introduces three key innovations: 1) Active Reinforcement Learning (RL), which dynamically adapts training to samples of varying difficulty; 2) Fine-Grained Violation Understanding, achieved through hierarchical reward functions and reasoning distillation; and 3) Progressive Multi-Stage Training, which systematically combines knowledge injection, curriculum-based passive RL, and active RL. Extensive experiments on both public and proprietary datasets, on both offline scenarios and online deployed A/B Testing, demonstrate that RAVEN++ outperforms general-purpose LLMs and specialized models like RAVEN in terms of fine-grained violation understanding, reasoning capabilities, and generalization ability.