Forward Consistency Learning with Gated Context Aggregation for Video Anomaly Detection

📅 2026-01-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing video anomaly detection methods, which often rely on large models and single-frame prediction errors, thereby hindering deployment on edge devices and neglecting long-term temporal consistency. To overcome these challenges, we propose FoGA, a lightweight model built upon a U-Net architecture that incorporates a gated context aggregation module to dynamically fuse encoder-decoder features. FoGA introduces forward consistency learning—achieved through a novel forward consistency loss combined with a hybrid anomaly scoring strategy—marking the first such approach in the field. With only approximately 2 million parameters and a runtime speed of 155 FPS, FoGA achieves state-of-the-art performance across multiple benchmarks, offering both high detection accuracy and practical feasibility for edge deployment.

Technology Category

Application Category

📝 Abstract
As a crucial element of public security, video anomaly detection (VAD) aims to measure deviations from normal patterns for various events in real-time surveillance systems. However, most existing VAD methods rely on large-scale models to pursue extreme accuracy, limiting their feasibility on resource-limited edge devices. Moreover, mainstream prediction-based VAD detects anomalies using only single-frame future prediction errors, overlooking the richer constraints from longer-term temporal forward information. In this paper, we introduce FoGA, a lightweight VAD model that performs Forward consistency learning with Gated context Aggregation, containing about 2M parameters and tailored for potential edge devices. Specifically, we propose a Unet-based method that performs feature extraction on consecutive frames to generate both immediate and forward predictions. Then, we introduce a gated context aggregation module into the skip connections to dynamically fuse encoder and decoder features at the same spatial scale. Finally, the model is jointly optimized with a novel forward consistency loss, and a hybrid anomaly measurement strategy is adopted to integrate errors from both immediate and forward frames for more accurate detection. Extensive experiments demonstrate the effectiveness of the proposed method, which substantially outperforms state-of-the-art competing methods, running up to 155 FPS. Hence, our FoGA achieves an excellent trade-off between performance and the efficiency metric.
Problem

Research questions and friction points this paper is trying to address.

video anomaly detection
edge devices
forward consistency
temporal prediction
lightweight model
Innovation

Methods, ideas, or system contributions that make the work stand out.

Forward Consistency Learning
Gated Context Aggregation
Lightweight Video Anomaly Detection
Temporal Prediction
Edge-efficient Model
🔎 Similar Papers
No similar papers found.
Jiahao Lyu
Jiahao Lyu
Xi’an University of Technology
video anomaly detection
M
Minghua Zhao
Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi’an University of Technology, Xi’an, 710048, China
X
Xuewen Huang
School of Cyber Science and Engineering, Xi’an Jiaotong University, Xi’an, 710049, China
Yifei Chen
Yifei Chen
Master of CS. Xi'an University of Technology
Video Anomaly DetectionComputer VisionFacial Expression Recognition
Shuangli Du
Shuangli Du
xi'an university of technology
deep learning
Jing Hu
Jing Hu
Associate professor, School of Computer Science and Engineering, Xi'an University of Technology
hyperspectral image processing
C
Cheng Shi
Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi’an University of Technology, Xi’an, 710048, China
Z
Zhiyong Lv
Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi’an University of Technology, Xi’an, 710048, China