Activity and Subject Detection for UCI HAR Dataset with&without missing Sensor Data

📅 2025-05-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses two practical challenges in human activity recognition (HAR): the absence of subject identity modeling and the prevalence of stochastic sensor data missingness. We propose the first lightweight multi-task LSTM framework tailored for the UCI HAR dataset, jointly performing activity classification and subject identification. We establish a novel dual-task joint learning baseline and systematically evaluate multiple imputation strategies for missing sensor data, identifying KNN imputation—without PCA dimensionality reduction—as optimal. Experimental results demonstrate 93.89% accuracy for activity recognition (approaching the state-of-the-art 96.67%) and 80.19% accuracy for subject identification across 30 subjects. Our approach introduces a scalable, personalized, and robust paradigm for HAR, enabling simultaneous inference of both activity and user identity from heterogeneous, incomplete sensor streams.

Technology Category

Application Category

📝 Abstract
Current studies in Human Activity Recognition (HAR) primarily focus on the classification of activities through sensor data, while there is not much emphasis placed on recognizing the individuals performing these activities. This type of classification is very important for developing personalized and context-sensitive applications. Additionally, the issue of missing sensor data, which often occurs in practical situations due to hardware malfunctions, has not been explored yet. This paper seeks to fill these voids by introducing a lightweight LSTM-based model that can be used to classify both activities and subjects. The proposed model was used to classify the HAR dataset by UCI [1], achieving an accuracy of 93.89% in activity recognition (across six activities), nearing the 96.67% benchmark, and an accuracy of 80.19% in subject recognition (involving 30 subjects), thereby establishing a new baseline for this area of research. We then simulate the absence of sensor data to mirror real-world scenarios and incorporate imputation techniques, both with and without Principal Component Analysis (PCA), to restore incomplete datasets. We found that K-Nearest Neighbors (KNN) imputation performs the best for filling the missing sensor data without PCA because the use of PCA resulted in slightly lower accuracy. These results demonstrate how well the framework handles missing sensor data, which is a major step forward in using the Human Activity Recognition dataset for reliable classification tasks.
Problem

Research questions and friction points this paper is trying to address.

Classify human activities and subjects using sensor data
Address missing sensor data in real-world HAR applications
Evaluate imputation techniques for incomplete datasets with/without PCA
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight LSTM model for activity and subject classification
KNN imputation handles missing sensor data effectively
PCA not used to maintain high accuracy
🔎 Similar Papers
No similar papers found.
Debashish Saha
Debashish Saha
Software Engineer at IBM TJ Watson Research Center
artificial intelligence
P
Piyush Malik
dept. of Comp. Sci., Stony Brook University, Stony Brook, USA
A
Adrika Saha
dept. of Comp. Eng., American International University Bangladesh, Dhaka, Bangladesh