VADMamba: Exploring State Space Models for Fast Video Anomaly Detection

📅 2025-03-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the longstanding trade-off between detection accuracy and inference speed in video anomaly detection, this paper pioneers the integration of state space models—specifically Mamba—into this task. We propose VQ-Mamba UNet, a novel architecture that synergistically combines vector quantization (VQ) with a non-negative visual state space (NVSS) module, enabling joint frame prediction and optical flow reconstruction within a multi-task learning framework. Additionally, we introduce a clip-level dual-branch evaluation strategy to enhance anomaly localization and scoring robustness. Our method achieves state-of-the-art inference speed on three mainstream benchmarks—outperforming CNN- and Transformer-based baselines by several-fold—while maintaining competitive detection accuracy. Experimental results demonstrate that state space models offer superior spatiotemporal modeling capacity and significant potential for real-time, high-fidelity video anomaly detection.

Technology Category

Application Category

📝 Abstract
Video anomaly detection (VAD) methods are mostly CNN-based or Transformer-based, achieving impressive results, but the focus on detection accuracy often comes at the expense of inference speed. The emergence of state space models in computer vision, exemplified by the Mamba model, demonstrates improved computational efficiency through selective scans and showcases the great potential for long-range modeling. Our study pioneers the application of Mamba to VAD, dubbed VADMamba, which is based on multi-task learning for frame prediction and optical flow reconstruction. Specifically, we propose the VQ-Mamba Unet (VQ-MaU) framework, which incorporates a Vector Quantization (VQ) layer and Mamba-based Non-negative Visual State Space (NVSS) block. Furthermore, two individual VQ-MaU networks separately predict frames and reconstruct corresponding optical flows, further boosting accuracy through a clip-level fusion evaluation strategy. Experimental results validate the efficacy of the proposed VADMamba across three benchmark datasets, demonstrating superior performance in inference speed compared to previous work. Code is available at https://github.com/jLooo/VADMamba.
Problem

Research questions and friction points this paper is trying to address.

Improving video anomaly detection speed using state space models
Combining Mamba model with multi-task learning for VAD
Enhancing accuracy via clip-level fusion and optical flow
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Mamba model for fast video anomaly detection
Incorporates Vector Quantization and NVSS block
Employs multi-task learning for frame and flow
🔎 Similar Papers
No similar papers found.
J
Jiahao Lyu
School of Computer Science and Engineering, Xi’an University of Technology, Xi’an, China
M
Minghua Zhao
School of Computer Science and Engineering, Xi’an University of Technology, Xi’an, China
Jing Hu
Jing Hu
Associate professor, School of Computer Science and Engineering, Xi'an University of Technology
hyperspectral image processing
X
Xuewen Huang
School of Computer Science and Engineering, Xi’an University of Technology, Xi’an, China
Y
Yifei Chen
School of Computer Science and Engineering, Xi’an University of Technology, Xi’an, China
Shuangli Du
Shuangli Du
xi'an university of technology
deep learning