MTFL: multi-timescale feature learning for weakly-supervised anomaly detection in surveillance videos

📅 2024-10-08
🏛️ International Conference on Machine Vision
📈 Citations: 5
Influential: 0
📄 PDF
🤖 AI Summary
To address the limitations in fine-grained motion modeling and multi-scale contextual understanding in weakly supervised anomaly detection for surveillance videos, this paper proposes a collaborative learning framework that jointly models short-, medium-, and long-term temporal features. We innovatively design a multi-timescale tubelet sampling mechanism, integrated with the Video Swin Transformer to capture spatiotemporal dynamics. Additionally, we introduce weakly supervised contrastive learning and cross-dataset transfer adaptation to enhance generalization. We construct VADD—the first large-scale, real-world anomaly video dataset—comprising 18 anomaly categories and 2,591 video clips. Extensive experiments demonstrate state-of-the-art performance: 89.78% AUC on UCF-Crime, 95.32% AUC on ShanghaiTech, and 84.57% AP on XD-Violence—surpassing all existing methods.

Technology Category

Application Category

📝 Abstract
Detection of anomaly events is relevant for public safety and requires a combination of fine-grained motion information and contextual events at variable time-scales. To this end, we propose a Multi-Timescale Feature Learning (MTFL) method to enhance the representation of anomaly features. Short, medium, and long temporal tubelets are employed to extract spatio-temporal video features using a Video Swin Transformer. Experimental results demonstrate that MTFL outperforms state-of-the-art methods on the UCF-Crime dataset, achieving an anomaly detection performance 89.78% AUC. Moreover, it performs complementary to SotA with 95.32% AUC on the ShanghaiTech and 84.57% AP on the XD-Violence dataset. Furthermore, we generate an extended dataset of the UCF-Crime for development and evaluation on a wider range of anomalies, namely Video Anomaly Detection Dataset (VADD), involving 2,591 videos in 18 classes with extensive coverage of realistic anomalies.
Problem

Research questions and friction points this paper is trying to address.

Detecting anomalies in surveillance videos for public safety
Learning multi-timescale features to enhance anomaly representation
Addressing variable temporal contexts in weakly-supervised video analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-timescale feature learning for anomaly detection
Video Swin Transformer extracts spatio-temporal features
Temporal tubelets capture short, medium, long timescales
🔎 Similar Papers
No similar papers found.