MTFL: multi-timescale feature learning for weakly-supervised anomaly detection in surveillance videos

📅 2024-10-08

🏛️ International Conference on Machine Vision

📈 Citations: 5

✨ Influential: 0

career value

216K/year

🤖 AI Summary

To address the limitations in fine-grained motion modeling and multi-scale contextual understanding in weakly supervised anomaly detection for surveillance videos, this paper proposes a collaborative learning framework that jointly models short-, medium-, and long-term temporal features. We innovatively design a multi-timescale tubelet sampling mechanism, integrated with the Video Swin Transformer to capture spatiotemporal dynamics. Additionally, we introduce weakly supervised contrastive learning and cross-dataset transfer adaptation to enhance generalization. We construct VADD—the first large-scale, real-world anomaly video dataset—comprising 18 anomaly categories and 2,591 video clips. Extensive experiments demonstrate state-of-the-art performance: 89.78% AUC on UCF-Crime, 95.32% AUC on ShanghaiTech, and 84.57% AP on XD-Violence—surpassing all existing methods.

Technology Category

Application Category

📝 Abstract

Detection of anomaly events is relevant for public safety and requires a combination of fine-grained motion information and contextual events at variable time-scales. To this end, we propose a Multi-Timescale Feature Learning (MTFL) method to enhance the representation of anomaly features. Short, medium, and long temporal tubelets are employed to extract spatio-temporal video features using a Video Swin Transformer. Experimental results demonstrate that MTFL outperforms state-of-the-art methods on the UCF-Crime dataset, achieving an anomaly detection performance 89.78% AUC. Moreover, it performs complementary to SotA with 95.32% AUC on the ShanghaiTech and 84.57% AP on the XD-Violence dataset. Furthermore, we generate an extended dataset of the UCF-Crime for development and evaluation on a wider range of anomalies, namely Video Anomaly Detection Dataset (VADD), involving 2,591 videos in 18 classes with extensive coverage of realistic anomalies.

Problem

Research questions and friction points this paper is trying to address.

Detecting anomalies in surveillance videos for public safety

Learning multi-timescale features to enhance anomaly representation

Addressing variable temporal contexts in weakly-supervised video analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-timescale feature learning for anomaly detection

Video Swin Transformer extracts spatio-temporal features

Temporal tubelets capture short, medium, long timescales

🔎 Similar Papers

Missiongnn: Hierarchical Multimodal GNN-Based Weakly Supervised Video Anomaly Recognition with Mission-Specific Knowledge Graph Generation