๐ค AI Summary
To address insufficient temporal robustness in video-based human pose estimation caused by image degradations (e.g., occlusion, motion blur), this paper proposes a novel framework jointly modeling semantic dynamic evolution and spatiotemporal collaboration. Our method introduces: (1) a learnable semantic state transition module that explicitly captures inter-frame semantic evolution of joint states; and (2) a bidirectional spatiotemporal graph co-propagation mechanism, integrating GCN-based spatial modeling with GRU-based temporal modeling, enhanced by semantic attention and cross-frame topological consistency constraints. Evaluated on PoseTrack18 and JTA, our approach achieves 78.3% and 82.6% mAP, respectively, while reducing temporal jitter by 37%โoutperforming current state-of-the-art methods.