🤖 AI Summary
This work proposes CPL-VAD, a dual-branch framework for weakly supervised video anomaly detection that simultaneously performs temporal localization and category identification using only video-level labels. The approach leverages a cross pseudo-labeling mechanism to enable mutual enhancement between the anomaly detection branch and the category classification branch, while integrating vision-language alignment to strengthen semantic discrimination. Evaluated on the XD-Violence and UCF-Crime benchmarks, the proposed method achieves state-of-the-art performance, significantly advancing the joint optimization of weakly supervised anomaly detection and fine-grained categorization in video analysis.
📝 Abstract
Weakly supervised video anomaly detection aims to detect anomalies and identify abnormal categories with only video-level labels. We propose CPL-VAD, a dual-branch framework with cross pseudo labeling. The binary anomaly detection branch focuses on snippet-level anomaly localization, while the category classification branch leverages vision-language alignment to recognize abnormal event categories. By exchanging pseudo labels, the two branches transfer complementary strengths, combining temporal precision with semantic discrimination. Experiments on XD-Violence and UCF-Crime demonstrate that CPL-VAD achieves state-of-the-art performance in both anomaly detection and abnormal category classification.