Pseudo Anomalies Are All You Need: Diffusion-Based Generation for Weakly-Supervised Video Anomaly Detection

📅 2025-12-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Video anomaly detection (VAD) faces significant challenges due to the scarcity and high acquisition cost of authentic abnormal video samples. To address this, we propose PA-VAD, the first weakly supervised VAD framework that operates entirely without real abnormal videos. Leveraging only a small set of normal images, PA-VAD employs CLIP to select initial images and jointly optimizes textual prompts with a vision-language model to guide a video diffusion model in synthesizing high-fidelity, scene-consistent pseudo-anomalous videos. Additionally, a domain-alignment regularization module is introduced to suppress excessive spatiotemporal perturbations during generation. Evaluated on ShanghaiTech and UCF-Crime, PA-VAD achieves AUC scores of 98.2% and 82.5%, respectively—substantially outperforming all existing methods requiring real anomalies as well as state-of-the-art unsupervised approaches (e.g., UVAD).

Technology Category

Application Category

📝 Abstract
Deploying video anomaly detection in practice is hampered by the scarcity and collection cost of real abnormal footage. We address this by training without any real abnormal videos while evaluating under the standard weakly supervised split, and we introduce PA-VAD, a generation-driven approach that learns a detector from synthesized pseudo-abnormal videos paired with real normal videos, using only a small set of real normal images to drive synthesis. For synthesis, we select class-relevant initial images with CLIP and refine textual prompts with a vision-language model to improve fidelity and scene consistency before invoking a video diffusion model. For training, we mitigate excessive spatiotemporal magnitude in synthesized anomalies by an domain-aligned regularized module that combines domain alignment and memory usage-aware updates. Extensive experiments show that our approach reaches 98.2% on ShanghaiTech and 82.5% on UCF-Crime, surpassing the strongest real-abnormal method on ShanghaiTech by +0.6% and outperforming the UVAD state-of-the-art on UCF-Crime by +1.9%. The results demonstrate that high-accuracy anomaly detection can be obtained without collecting real anomalies, providing a practical path toward scalable deployment.
Problem

Research questions and friction points this paper is trying to address.

Detects video anomalies without real abnormal footage
Generates pseudo-abnormal videos using diffusion models
Improves detection accuracy with domain-aligned regularization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates pseudo-abnormal videos using diffusion models
Refines prompts with vision-language models for fidelity
Mitigates excessive anomalies via domain-aligned regularization
🔎 Similar Papers
No similar papers found.
S
Satoshi Hashimoto
KDDI Research, Inc.
Hitoshi Nishimura
Hitoshi Nishimura
KDDI Research, Inc.
Y
Yanan Wang
KDDI Research, Inc.
Mori Kurokawa
Mori Kurokawa
KDDI Research, Inc.
Machine Learning