🤖 AI Summary
Loosely worn inertial sensors introduce severe, structured, and position-dependent noise due to relative motion between the sensors and the body, significantly degrading the performance of conventional inertial motion capture systems. To address this challenge, this work proposes GID—a lightweight, plug-and-play Transformer framework that decouples denoising and pose estimation into three stages: position-specific denoising, adaptive cross-segment fusion, and universal pose prediction. By integrating a position-aware mixture-of-experts architecture with a shared spatiotemporal backbone, GID enables stable training under limited paired data and demonstrates strong generalization across unseen users, motions, and clothing types. Experiments show that GID achieves real-time, high-accuracy denoising with training from only a single user, substantially enhancing the performance of existing inertial motion capture methods.
📝 Abstract
Wearable inertial motion capture (MoCap) provides a portable, occlusion-free, and privacy-preserving alternative to camera-based systems, but its accuracy depends on tightly attached sensors - an intrusive and uncomfortable requirement for daily use. Embedding IMUs into loose-fitting garments is a desirable alternative, yet sensor-body displacement introduces severe, structured, and location-dependent corruption that breaks standard inertial pipelines. We propose GID (Garment Inertial Denoiser), a lightweight, plug-and-play Transformer that factorizes loose-wear MoCap into three stages: (i) location-specific denoising, (ii) adaptive cross-wear fusion, and (iii) general pose prediction. GID uses a location-aware expert architecture, where a shared spatio-temporal backbone models global motion while per-IMU expert heads specialize in local garment dynamics, and a lightweight fusion module ensures cross-part consistency. This inductive bias enables stable training and effective learning from limited paired loose-tight IMU data. We also introduce GarMoCap, a combined public and newly collected dataset covering diverse users, motions, and garments. Experiments show that GID enables accurate, real-time denoising from single-user training and generalizes across unseen users, motions, and garment types, consistently improving state-of-the-art inertial MoCap methods when used as a drop-in module.