🤖 AI Summary
To address the challenge of monitoring fall risk in older adults, this paper proposes a lightweight and efficient vision-based fall detection method. We employ a 3D CNN built upon the I3D architecture and—novel for fall detection—introduce transfer learning: the Sports1M-pretrained backbone is frozen, while only a lightweight SVM classifier is fine-tuned, drastically reducing training cost and data requirements. Using hierarchical random five-fold cross-validation, our approach achieves an average accuracy exceeding 96% on both the GMDCSA and CAUCAFall datasets, significantly outperforming 2D CNN baselines. The core contribution lies in empirically validating the frozen 3D CNN + shallow SVM paradigm, which jointly ensures high accuracy, low computational/resource overhead, and strong generalization across diverse datasets. The implementation code is publicly available.
📝 Abstract
Unintentional or accidental falls are one of the significant health issues in senior persons. The population of senior persons is increasing steadily. So, there is a need for an automated fall detection monitoring system. This paper introduces a vision-based fall detection system using a pre-trained 3D CNN. Unlike 2D CNN, 3D CNN extracts not only spatial but also temporal features. The proposed model leverages the original learned weights of a 3D CNN model pre-trained on the Sports1M dataset to extract the spatio-temporal features. Only the SVM classifier was trained, which saves the time required to train the 3D CNN. Stratified shuffle five split cross-validation has been used to split the dataset into training and testing data. Extracted features from the proposed 3D CNN model were fed to an SVM classifier to classify the activity as fall or ADL. Two datasets, GMDCSA and CAUCAFall, were utilized to conduct the experiment. The source code for this work can be accessed via the following link: https://github.com/ekramalam/HFD_3DCNN.