Intelligent Image Sensing for Crime Analysis: A ML Approach towards Enhanced Violence Detection and Investigation

📅 2025-06-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of timely detecting diverse and sporadic violent behaviors—where conventional surveillance methods fall short—this paper proposes an end-to-end video violence detection and classification system tailored for real-time security applications. Methodologically, it introduces a novel hybrid architecture integrating 3D CNNs with a disentangled 3D convolutional layer followed by a bidirectional LSTM, enabling fine-grained temporal modeling. We establish a unified frame-level annotation protocol for heterogeneous, cross-source videos (e.g., surveillance footage, smartphone recordings, sports broadcasts, and synthetic data) and curate a proprietary multi-source dataset accordingly. The system is deployed on a Raspberry Pi edge platform, supporting fully automated pipeline execution—from video acquisition and feature extraction to multi-class violence recognition. Evaluated on a multi-source mixed test set, it achieves 92.7% accuracy with a 38% reduction in inference latency, significantly enhancing edge resource efficiency and real-time responsiveness.

Technology Category

Application Category

📝 Abstract
The increasing global crime rate, coupled with substantial human and property losses, highlights the limitations of traditional surveillance methods in promptly detecting diverse and unexpected acts of violence. Addressing this pressing need for automatic violence detection, we leverage Machine Learning to detect and categorize violent events in video streams. This paper introduces a comprehensive framework for violence detection and classification, employing Supervised Learning for both binary and multi-class violence classification. The detection model relies on 3D Convolutional Neural Networks, while the classification model utilizes the separable convolutional 3D model for feature extraction and bidirectional LSTM for temporal processing. Training is conducted on a diverse customized datasets with frame-level annotations, incorporating videos from surveillance cameras, human recordings, hockey fight, sohas and wvd dataset across various platforms. Additionally, a camera module integrated with raspberry pi is used to capture live video feed, which is sent to the ML model for processing. Thus, demonstrating improved performance in terms of computational resource efficiency and accuracy.
Problem

Research questions and friction points this paper is trying to address.

Detect violent events in video streams using Machine Learning
Classify violence types with supervised learning and 3D CNNs
Enhance surveillance efficiency and accuracy with real-time processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses 3D CNN for violence detection
Employs separable 3D CNN and BiLSTM
Integrates Raspberry Pi for live feed
🔎 Similar Papers
No similar papers found.
Aritra Dutta
Aritra Dutta
Assistant Professor, University of Central Florida
OptimizationMachine LearningSignal Processing
P
Pushpita Boral
Department of Networking and Communications, School of Computing, Faculty of Engineering and Technology, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, 603203, India
G
G. Suseela
Department of Networking and Communications, School of Computing, Faculty of Engineering and Technology, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, 603203, India