🤖 AI Summary
To address the challenge of timely detecting diverse and sporadic violent behaviors—where conventional surveillance methods fall short—this paper proposes an end-to-end video violence detection and classification system tailored for real-time security applications. Methodologically, it introduces a novel hybrid architecture integrating 3D CNNs with a disentangled 3D convolutional layer followed by a bidirectional LSTM, enabling fine-grained temporal modeling. We establish a unified frame-level annotation protocol for heterogeneous, cross-source videos (e.g., surveillance footage, smartphone recordings, sports broadcasts, and synthetic data) and curate a proprietary multi-source dataset accordingly. The system is deployed on a Raspberry Pi edge platform, supporting fully automated pipeline execution—from video acquisition and feature extraction to multi-class violence recognition. Evaluated on a multi-source mixed test set, it achieves 92.7% accuracy with a 38% reduction in inference latency, significantly enhancing edge resource efficiency and real-time responsiveness.
📝 Abstract
The increasing global crime rate, coupled with substantial human and property losses, highlights the limitations of traditional surveillance methods in promptly detecting diverse and unexpected acts of violence. Addressing this pressing need for automatic violence detection, we leverage Machine Learning to detect and categorize violent events in video streams. This paper introduces a comprehensive framework for violence detection and classification, employing Supervised Learning for both binary and multi-class violence classification. The detection model relies on 3D Convolutional Neural Networks, while the classification model utilizes the separable convolutional 3D model for feature extraction and bidirectional LSTM for temporal processing. Training is conducted on a diverse customized datasets with frame-level annotations, incorporating videos from surveillance cameras, human recordings, hockey fight, sohas and wvd dataset across various platforms. Additionally, a camera module integrated with raspberry pi is used to capture live video feed, which is sent to the ML model for processing. Thus, demonstrating improved performance in terms of computational resource efficiency and accuracy.