🤖 AI Summary
Existing human mesh reconstruction methods often suffer from inaccurate poses and temporal jitter under occlusion. This work proposes a motion-prior-aware reconstruction framework that integrates temporal coherence and occlusion reasoning. A lightweight spatiotemporal occlusion detection module identifies occluded regions, and a history-based pose sequence is leveraged to predict plausible positions for occluded joints. These predictions are then fused with image-derived features through a motion-aware integration mechanism. Furthermore, an inverse kinematics refinement step, guided by unoccluded motion priors, fine-tunes the final pose to enhance anatomical plausibility and smoothness. The proposed approach significantly improves both accuracy and temporal consistency in occluded scenarios, achieving state-of-the-art performance across multiple standard and occlusion-specific benchmarks.
📝 Abstract
Although recent studies have made remarkable progress in human mesh recovery, they still exhibit limited robustness to occlusions and often produce inaccurate poses and severe motion jitter due to the insufficient spatial features for occluded body parts. Inspired by the rapid advancements in human motion prediction, we discover that compared to occluded image features, pose sequence inherently contains reliable motion prior for estimating occluded body parts. In this paper, we incorporate Motion Prior for Occluded human mesh recovery, called MoPO. Our MoPO mainly consists of two components: 1) The motion de-occlusion module, where we propose a spatial-temporal occlusion detector to detect joint visibility, and then we propose a lightweight motion predictor to complete the occluded body parts by predicting the most plausible joint positions based on history poses. 2) The motion-aware fusion and refinement module, which fuses the completed joint sequence with image features to estimate human shape and initial human pose. Moreover, the completed joint sequence is further used to refine the final human pose through inverse kinematics, which provides the occlusion-free motion prior for regressing human poses. Extensive experiments demonstrate that MoPO achieves state-of-the-art performance on both occlusion-specific and standard benchmarks, significantly enhancing the accuracy and temporal consistency of occluded human mesh recovery. Our code and demo can be found in the supplementary material.