CRCL: Causal Representation Consistency Learning for Anomaly Detection in Surveillance Videos

📅 2025-03-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses scene-induced label-agnostic data shifts and missed detection of subtle anomalies in unsupervised surveillance video anomaly detection. It introduces structural causal models (SCMs) to this task for the first time, proposing a causality-driven normality modeling framework. Methodologically, it designs dual mechanisms—scene-debiasing learning and causality-inspired normality learning—integrating contrastive consistency learning, feature disentanglement, and causal intervention to enable end-to-end learning of robust causal representations. Contributions include: (i) the first unsupervised causal representation learning paradigm tailored for video anomaly detection; (ii) strong generalization across diverse scene biases; and (iii) state-of-the-art performance achieved with only minimal training data, significantly outperforming existing methods on mainstream benchmarks.

Technology Category

Application Category

📝 Abstract
Video Anomaly Detection (VAD) remains a fundamental yet formidable task in the video understanding community, with promising applications in areas such as information forensics and public safety protection. Due to the rarity and diversity of anomalies, existing methods only use easily collected regular events to model the inherent normality of normal spatial-temporal patterns in an unsupervised manner. Previous studies have shown that existing unsupervised VAD models are incapable of label-independent data offsets (e.g., scene changes) in real-world scenarios and may fail to respond to light anomalies due to the overgeneralization of deep neural networks. Inspired by causality learning, we argue that there exist causal factors that can adequately generalize the prototypical patterns of regular events and present significant deviations when anomalous instances occur. In this regard, we propose Causal Representation Consistency Learning (CRCL) to implicitly mine potential scene-robust causal variable in unsupervised video normality learning. Specifically, building on the structural causal models, we propose scene-debiasing learning and causality-inspired normality learning to strip away entangled scene bias in deep representations and learn causal video normality, respectively. Extensive experiments on benchmarks validate the superiority of our method over conventional deep representation learning. Moreover, ablation studies and extension validation show that the CRCL can cope with label-independent biases in multi-scene settings and maintain stable performance with only limited training data available.
Problem

Research questions and friction points this paper is trying to address.

Detecting anomalies in surveillance videos despite rarity and diversity.
Addressing label-independent data offsets in unsupervised VAD models.
Learning causal video normality to handle scene biases effectively.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal Representation Consistency Learning for anomalies
Scene-debiasing learning to remove scene bias
Causality-inspired normality learning for stable performance
Y
Yang Liu
Department of Computer Science, University of Texas at Dallas
Hongjin Wang
Hongjin Wang
Department of Computer Science, University of Texas at Dallas
Zepu Wang
Zepu Wang
University of Washington
Time SeriesSpatial Temporal Data MiningTransportationVehicle Network
Xiaoguang Zhu
Xiaoguang Zhu
Postdoc Researcher, University of California, Davis
AI for HealthComputer VisionImage RetrievalVideo Analysis
J
Jing Liu
Department of Computer Science, University of Texas at Dallas
P
Peng Sun
Department of Computer Science, University of Texas at Dallas
R
Rui Tang
Department of Computer Science, University of Texas at Dallas
J
Jianwei Du
Department of Computer Science, University of Texas at Dallas
Victor C. M. Leung
Victor C. M. Leung
SMBU / Shenzhen University / The University of British Columbia
communication systemswireless networksmobile systems
L
Liang Song
Department of Computer Science, University of Texas at Dallas