Distilling Aggregated Knowledge for Weakly-Supervised Video Anomaly Detection

📅 2024-06-05
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of limited training samples, video-level supervision only, and extreme class imbalance in weakly supervised video anomaly detection, this paper proposes a two-tier knowledge distillation framework coupled with a decoupled cross-attention feature aggregation network. The method enables the first efficient transfer of representations from multiple teacher backbones to a single student model: the upper tier mitigates label sparsity via video-level knowledge distillation, while the lower tier fuses multi-source features and suppresses background interference through decoupled cross-attention. Evaluated on UCF-Crime, ShanghaiTech, and XD-Violence, the approach achieves absolute AUC improvements of 1.36%, 0.78%, and 7.02%, respectively—setting new state-of-the-art performance. It significantly enhances anomaly localization accuracy and generalization capability under few-shot, weakly supervised settings.

Technology Category

Application Category

📝 Abstract
Video anomaly detection aims to develop automated models capable of identifying abnormal events in surveillance videos. The benchmark setup for this task is extremely challenging due to: i) the limited size of the training sets, ii) weak supervision provided in terms of video-level labels, and iii) intrinsic class imbalance induced by the scarcity of abnormal events. In this work, we show that distilling knowledge from aggregated representations of multiple backbones into a single-backbone Student model achieves state-of-the-art performance. In particular, we develop a bi-level distillation approach along with a novel disentangled cross-attention-based feature aggregation network. Our proposed approach, DAKD (Distilling Aggregated Knowledge with Disentangled Attention), demonstrates superior performance compared to existing methods across multiple benchmark datasets. Notably, we achieve significant improvements of 1.36%, 0.78%, and 7.02% on the UCF-Crime, ShanghaiTech, and XD-Violence datasets, respectively.
Problem

Research questions and friction points this paper is trying to address.

Video Anomaly Detection
Limited Training Data
Rare Abnormal Events
Innovation

Methods, ideas, or system contributions that make the work stand out.

DAKD
Dual-Layer Learning
Video Anomaly Detection
🔎 Similar Papers
No similar papers found.
J
Jash Dalvi
K J Somaiya Institute of Technology
Ali Dabouei
Ali Dabouei
CTO of Neptune Technologies
Machine learningDeep learningComputer Vision
G
Gunjan Dhanuka
Carnegie Mellon University
M
Min Xu
Carnegie Mellon University