🤖 AI Summary
This work addresses the limitations of skeleton-based representations in pose-driven video anomaly detection—specifically, poor class coverage and inadequate privacy preservation. We propose, for the first time, using 2D human silhouettes instead of skeletons as the foundational behavioral representation. Our method introduces a dual-path learning framework jointly optimizing silhouette sequence reconstruction (regression) and anomaly classification, augmented by two lightweight silhouette encoding strategies and implemented via shallow deep neural networks for computational efficiency. Extensive experiments across six benchmark datasets—including UCF-Crime and ShanghaiTech—demonstrate significantly reduced computational overhead while maintaining competitive detection performance. Key contributions are: (1) establishing a novel silhouette-driven paradigm for pose-based anomaly detection; (2) eliminating skeleton dependency to enhance multi-class generalizability and privacy compliance; and (3) providing an extensible, resource-efficient lightweight detection framework.
📝 Abstract
In Pose-based Video Anomaly Detection prior art is rooted on the assumption that abnormal events can be mostly regarded as a result of uncommon human behavior. Opposed to utilizing skeleton representations of humans, however, we investigate the potential of learning recurrent motion patterns of normal human behavior using 2D contours. Keeping all advantages of pose-based methods, such as increased object anonymization, the shift from human skeletons to contours is hypothesized to leave the opportunity to cover more object categories open for future research. We propose formulating the problem as a regression and a classification task, and additionally explore two distinct data representation techniques for contours. To further reduce the computational complexity of Pose-based Video Anomaly Detection solutions, all methods in this study are based on shallow Neural Networks from the field of Deep Learning, and evaluated on the three most prominent benchmark datasets within Video Anomaly Detection and their human-related counterparts, totaling six datasets. Our results indicate that this novel perspective on Pose-based Video Anomaly Detection marks a promising direction for future research.