Transformer Based Self-Context Aware Prediction for Few-Shot Anomaly Detection in Videos

๐Ÿ“… 2022-10-16
๐Ÿ›๏ธ International Conference on Information Photonics
๐Ÿ“ˆ Citations: 6
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Video anomaly detection faces challenges from diverse anomaly types and severe scarcity of labeled anomalies. This paper proposes a self-context-aware one-class few-shot Transformer framework that trains video-specific models using only the initial normal frames of each video. Leveraging self-supervised temporal attention, the model predicts subsequent frame features and localizes anomalies at the frame level via predictionโ€“ground-truth feature residuals. Crucially, it requires no anomalous samples, enabling both video-specific modeling and dynamic contextual adaptation. The core innovation lies in deeply integrating self-attention with one-class few-shot temporal forecasting to establish an end-to-end reconstruction-residual detection paradigm. Extensive experiments demonstrate significant improvements over state-of-the-art methods across multiple standard benchmarks. Ablation studies confirm that the self-context mechanism critically enhances both detection accuracy and cross-scenario generalization.

Technology Category

Application Category

๐Ÿ“ Abstract
Anomaly detection in videos is a challenging task as anomalies in different videos are of different kinds. Therefore, a promising way to approach video anomaly detection is by learning the non-anomalous nature of the video at hand. To this end, we propose a one-class few-shot learning driven transformer based approach for anomaly detection in videos that is self-context aware. Features from the first few consecutive non-anomalous frames in a video are used to train the transformer in predicting the non-anomalous feature of the subsequent frame. This takes place under the attention of a self-context learned from the input features themselves. After the learning, given a few previous frames, the video-specific transformer is used to infer if a frame is anomalous or not by comparing the feature predicted by it with the actual. The effectiveness of the proposed method with respect to the state-of-the-art is demonstrated through qualitative and quantitative results on different standard datasets. We also study the positive effect of the self-context used in our approach.
Problem

Research questions and friction points this paper is trying to address.

Detects anomalies in videos using few-shot learning.
Uses transformer to predict non-anomalous video features.
Compares predicted and actual features for anomaly detection.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based self-context aware prediction
Few-shot learning for anomaly detection
Video-specific transformer for frame comparison
G
Gargi V. Pillai
Department of E&ECE, Indian Institute of Technology Kharagpur, India
A
A. Verma
Department of E&ECE, Indian Institute of Technology Kharagpur, India
Debashis Sen
Debashis Sen
Department of Electronics and Electrical Communication Engineering, Indian Institute of Technology
VisionImage and Video ProcessingUncertainty HandlingDeep Learning