🤖 AI Summary
To address the severe robustness degradation of existing 6D pose tracking methods under high-speed, coordinated motion between camera and object, this paper proposes a real-time, retraining-free tracking framework. Methodologically, it integrates visual-inertial odometry (VIO) for motion compensation, combines depth-aware 2D tracking with VIO-guided Kalman filtering to establish a closed-loop system, and introduces hierarchical pose refinement for joint geometric and sensor-data optimization. To our knowledge, this is the first approach achieving stable 6D pose tracking under large dynamic displacements and high rotational velocities without scene-specific retraining. Evaluated on both synthetic and real-world datasets, the method outperforms state-of-the-art approaches, achieving real-time performance exceeding 30 FPS and reducing pose estimation error by over 35%. Crucially, it maintains high accuracy and convergence stability even under rapid motion conditions.
📝 Abstract
We present DynamicPose, a retraining-free 6D pose tracking framework that improves tracking robustness in fast-moving camera and object scenarios. Previous work is mainly applicable to static or quasi-static scenes, and its performance significantly deteriorates when both the object and the camera move rapidly. To overcome these challenges, we propose three synergistic components: (1) A visual-inertial odometry compensates for the shift in the Region of Interest (ROI) caused by camera motion; (2) A depth-informed 2D tracker corrects ROI deviations caused by large object translation; (3) A VIO-guided Kalman filter predicts object rotation, generates multiple candidate poses, and then obtains the final pose by hierarchical refinement. The 6D pose tracking results guide subsequent 2D tracking and Kalman filter updates, forming a closed-loop system that ensures accurate pose initialization and precise pose tracking. Simulation and real-world experiments demonstrate the effectiveness of our method, achieving real-time and robust 6D pose tracking for fast-moving cameras and objects.