Mixture of Experts Guided by Gaussian Splatters Matters: A new Approach to Weakly-Supervised Video Anomaly Detection

📅 2025-08-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Weakly supervised video anomaly detection (WSVAD) struggles to identify complex, multi-stage anomalies (e.g., shoplifting) due to two key bottlenecks: (1) monolithic models fail to capture the diversity of anomaly types, and (2) weak supervision lacks fine-grained temporal localization capability. To address these, we propose Gaussian Splattering–guided Mixture of Experts (GS-MoE). GS-MoE employs a Gaussian splattering loss to encourage expert specialization for fine-grained anomaly category modeling, and introduces a dynamic gating mechanism to fuse multi-expert predictions. Additionally, a temporal consistency constraint is imposed to enhance the time-discriminative power of weak supervision. Evaluated on UCF-Crime, GS-MoE achieves 91.58% AUC—substantially surpassing state-of-the-art methods. It also attains leading performance on XD-Violence and MSAD benchmarks, demonstrating robust generalization across diverse anomaly scenarios and supervision regimes.

Technology Category

Application Category

📝 Abstract
Video Anomaly Detection (VAD) is a challenging task due to the variability of anomalous events and the limited availability of labeled data. Under the Weakly-Supervised VAD (WSVAD) paradigm, only video-level labels are provided during training, while predictions are made at the frame level. Although state-of-the-art models perform well on simple anomalies (e.g., explosions), they struggle with complex real-world events (e.g., shoplifting). This difficulty stems from two key issues: (1) the inability of current models to address the diversity of anomaly types, as they process all categories with a shared model, overlooking category-specific features; and (2) the weak supervision signal, which lacks precise temporal information, limiting the ability to capture nuanced anomalous patterns blended with normal events. To address these challenges, we propose Gaussian Splatting-guided Mixture of Experts (GS-MoE), a novel framework that employs a set of expert models, each specialized in capturing specific anomaly types. These experts are guided by a temporal Gaussian splatting loss, enabling the model to leverage temporal consistency and enhance weak supervision. The Gaussian splatting approach encourages a more precise and comprehensive representation of anomalies by focusing on temporal segments most likely to contain abnormal events. The predictions from these specialized experts are integrated through a mixture-of-experts mechanism to model complex relationships across diverse anomaly patterns. Our approach achieves state-of-the-art performance, with a 91.58% AUC on the UCF-Crime dataset, and demonstrates superior results on XD-Violence and MSAD datasets. By leveraging category-specific expertise and temporal guidance, GS-MoE sets a new benchmark for VAD under weak supervision.
Problem

Research questions and friction points this paper is trying to address.

Addressing diversity of anomaly types in weakly-supervised video detection
Enhancing weak supervision with temporal Gaussian splatting guidance
Improving detection of complex real-world anomalous events
Innovation

Methods, ideas, or system contributions that make the work stand out.

Gaussian Splatting-guided Mixture of Experts
Specialized experts for anomaly types
Temporal Gaussian splatting loss
🔎 Similar Papers
No similar papers found.