🤖 AI Summary
This work addresses the challenge of real-time, biomechanically plausible human action recognition in industrial settings using standard 2D cameras. We propose an end-to-end framework that takes joint angles—not joint coordinates—as input, integrating human kinematic modeling and biomechanical priors into a lightweight Transformer architecture equipped with a temporal smoothing mechanism. This design significantly enhances robustness against pose variations, inter-subject anatomical differences, and camera viewpoint shifts, enabling truly low-latency online interaction. Evaluated on a custom industrial action dataset comprising 11 subjects, our method achieves 88% classification accuracy, outperforming mainstream real-time baselines. Furthermore, it successfully enables real-time closed-loop control of a simulated robot, demonstrating practical applicability in industrial automation scenarios.
📝 Abstract
This paper presents a novel framework for real-time human action recognition in industrial contexts, using standard 2D cameras. We introduce a complete pipeline for robust and real-time estimation of human joint kinematics, input to a temporally smoothed Transformer-based network, for action recognition. We rely on a new dataset including 11 subjects performing various actions, to evaluate our approach. Unlike most of the literature that relies on joint center positions (JCP) and is offline, ours uses biomechanical prior, eg. joint angles, for fast and robust real-time recognition. Besides, joint angles make the proposed method agnostic to sensor and subject poses as well as to anthropometric differences, and ensure robustness across environments and subjects. Our proposed learning model outperforms the best baseline model, running also in real-time, along various metrics. It achieves 88% accuracy and shows great generalization ability, for subjects not facing the cameras. Finally, we demonstrate the robustness and usefulness of our technique, through an online interaction experiment, with a simulated robot controlled in real-time via the recognized actions.