🤖 AI Summary
To address the challenges of dynamic, multimodal, and hard-to-detect harmful content in live-streaming scenarios, this paper proposes a dual-channel hybrid moderation framework: (1) a supervised classification channel and (2) an MLLM-enhanced reference-based similarity matching channel. Leveraging knowledge distillation from multimodal large language models (text, audio, and vision), the framework achieves lightweight inference while jointly supporting both known policy violation detection and discovery of novel, covert malicious behaviors. The classification channel achieves 67% recall at 80% precision; the similarity channel attains 76% recall. Large-scale A/B testing demonstrates a 6–8% reduction in user exposure to harmful live streams. The core innovation lies in a dynamically coordinated mechanism integrating supervised learning with unsupervised similarity matching, significantly improving robustness, generalization, and scalability.
📝 Abstract
Content moderation remains a critical yet challenging task for large-scale user-generated video platforms, especially in livestreaming environments where moderation must be timely, multimodal, and robust to evolving forms of unwanted content. We present a hybrid moderation framework deployed at production scale that combines supervised classification for known violations with reference-based similarity matching for novel or subtle cases. This hybrid design enables robust detection of both explicit violations and novel edge cases that evade traditional classifiers. Multimodal inputs (text, audio, visual) are processed through both pipelines, with a multimodal large language model (MLLM) distilling knowledge into each to boost accuracy while keeping inference lightweight. In production, the classification pipeline achieves 67% recall at 80% precision, and the similarity pipeline achieves 76% recall at 80% precision. Large-scale A/B tests show a 6-8% reduction in user views of unwanted livestreams}. These results demonstrate a scalable and adaptable approach to multimodal content governance, capable of addressing both explicit violations and emerging adversarial behaviors.