🤖 AI Summary
Millimeter-wave (mmWave) 3D pose estimation suffers from signal sparsity and weak reflections, leading models to over-rely on statistical priors—thereby degrading performance on downstream tasks such as gesture recognition. To address this, we propose mmJoints, the first framework to explicitly model two complementary joint-level descriptors: “perceptibility probability” and “positional reliability.” This explicit modeling renders model bias transparent, enhancing both interpretability and robustness for downstream applications. Our method leverages a pre-trained black-box pose estimator and integrates signal analysis with probabilistic modeling to generate auxiliary confidence descriptors—without modifying the backbone architecture. Evaluated across 13 diverse pose scenarios, mmJoints achieves descriptor estimation error below 4.2%, improves joint localization accuracy by 12.5%, and boosts action recognition accuracy by 16%. This work establishes a novel paradigm for trustworthy mmWave-based pose estimation under sparse sensing conditions.
📝 Abstract
In mmWave-based pose estimation, sparse signals and weak reflections often cause models to infer body joints from statistical priors rather than sensor data. While prior knowledge helps in learning meaningful representations, over-reliance on it degrades performance in downstream tasks like gesture and activity recognition. In this paper, we introduce mmJoints, a framework that augments a pre-trained, black-box mmWave-based 3D pose estimator's output with additional joint descriptors. Rather than mitigating bias, mmJoints makes it explicit by estimating the likelihood of a joint being sensed and the reliability of its predicted location. These descriptors enhance interpretability and improve downstream task accuracy. Through extensive evaluations using over 115,000 signal frames across 13 pose estimation settings, we show that mmJoints estimates descriptors with an error rate below 4.2%. mmJoints also improves joint position accuracy by up to 12.5% and boosts activity recognition by up to 16% over state-of-the-art methods.