🤖 AI Summary
Existing motion sub-action decomposition methods suffer from low reconstruction accuracy, high computational overhead, and poor generalization, primarily due to scarcity of labeled training data. To address these limitations, we propose a real-time semi-supervised deep learning framework: the model is initialized using physically plausible synthetic data generated via the minimum-jerk principle, then iteratively refined on unlabeled real-world motion sequences through a differentiable reconstruction objective. Employing a fully convolutional architecture, our method achieves millisecond-level inference (<1 ms per frame). Crucially, it requires no manual annotations. Extensive experiments demonstrate significant improvements over state-of-the-art methods on both synthetic and real benchmarks—particularly under noise corruption and in complex tasks such as turning, handwriting, and pointing—exhibiting strong robustness. Our approach effectively breaks the classical trade-off among accuracy, efficiency, and data dependency, making it suitable for practical applications including human–computer interaction and clinical rehabilitation assessment.
📝 Abstract
This paper introduces a SSSUMO, semi-supervised deep learning approach for submovement decomposition that achieves state-of-the-art accuracy and speed. While submovement analysis offers valuable insights into motor control, existing methods struggle with reconstruction accuracy, computational cost, and validation, due to the difficulty of obtaining hand-labeled data. We address these challenges using a semi-supervised learning framework. This framework learns from synthetic data, initially generated from minimum-jerk principles and then iteratively refined through adaptation to unlabeled human movement data. Our fully convolutional architecture with differentiable reconstruction significantly surpasses existing methods on both synthetic and diverse human motion datasets, demonstrating robustness even in high-noise conditions. Crucially, the model operates in real-time (less than a millisecond per input second), a substantial improvement over optimization-based techniques. This enhanced performance facilitates new applications in human-computer interaction, rehabilitation medicine, and motor control studies. We demonstrate the model's effectiveness across diverse human-performed tasks such as steering, rotation, pointing, object moving, handwriting, and mouse-controlled gaming, showing notable improvements particularly on challenging datasets where traditional methods largely fail. Training and benchmarking source code, along with pre-trained model weights, are made publicly available at https://github.com/dolphin-in-a-coma/sssumo.