Transformer Based Self-Context Aware Prediction for Few-Shot Anomaly Detection in Videos

📅 2022-10-16

🏛️ International Conference on Information Photonics

📈 Citations: 6

✨ Influential: 0

career value

140K/year

🤖 AI Summary

Video anomaly detection faces challenges from diverse anomaly types and severe scarcity of labeled anomalies. This paper proposes a self-context-aware one-class few-shot Transformer framework that trains video-specific models using only the initial normal frames of each video. Leveraging self-supervised temporal attention, the model predicts subsequent frame features and localizes anomalies at the frame level via prediction–ground-truth feature residuals. Crucially, it requires no anomalous samples, enabling both video-specific modeling and dynamic contextual adaptation. The core innovation lies in deeply integrating self-attention with one-class few-shot temporal forecasting to establish an end-to-end reconstruction-residual detection paradigm. Extensive experiments demonstrate significant improvements over state-of-the-art methods across multiple standard benchmarks. Ablation studies confirm that the self-context mechanism critically enhances both detection accuracy and cross-scenario generalization.

Technology Category

Application Category

📝 Abstract

Anomaly detection in videos is a challenging task as anomalies in different videos are of different kinds. Therefore, a promising way to approach video anomaly detection is by learning the non-anomalous nature of the video at hand. To this end, we propose a one-class few-shot learning driven transformer based approach for anomaly detection in videos that is self-context aware. Features from the first few consecutive non-anomalous frames in a video are used to train the transformer in predicting the non-anomalous feature of the subsequent frame. This takes place under the attention of a self-context learned from the input features themselves. After the learning, given a few previous frames, the video-specific transformer is used to infer if a frame is anomalous or not by comparing the feature predicted by it with the actual. The effectiveness of the proposed method with respect to the state-of-the-art is demonstrated through qualitative and quantitative results on different standard datasets. We also study the positive effect of the self-context used in our approach.

Problem

Research questions and friction points this paper is trying to address.

Detects anomalies in videos using few-shot learning.

Uses transformer to predict non-anomalous video features.

Compares predicted and actual features for anomaly detection.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based self-context aware prediction

Few-shot learning for anomaly detection

Video-specific transformer for frame comparison

🔎 Similar Papers

MTFL: multi-timescale feature learning for weakly-supervised anomaly detection in surveillance videos