🤖 AI Summary
Robust zero-shot 6D pose estimation and real-time tracking of novel objects on edge devices remain challenging under complex illumination and cluttered scenes. Method: We propose a lighting-invariant color-pair feature representation that unifies initial pose estimation and motion tracking. Initial estimation leverages cross-modal feature matching between RGB-D images and 3D mesh models; tracking employs a lightweight temporal correspondence verification module to enforce motion consistency efficiently. Contribution/Results: The method requires no object-specific training data, enabling zero-shot generalization. Evaluated on standard benchmarks, it achieves state-of-the-art accuracy while maintaining real-time performance (>30 FPS) and strong robustness against severe pose changes and illumination variations. Its computational efficiency and resilience significantly enhance feasibility for edge deployment.
📝 Abstract
Robust 6D pose estimation of novel objects under challenging illumination remains a significant challenge, often requiring a trade-off between accurate initial pose estimation and efficient real-time tracking. We present a unified framework explicitly designed for efficient execution on edge devices, which synergizes a robust initial estimation module with a fast motion-based tracker. The key to our approach is a shared, lighting-invariant color-pair feature representation that forms a consistent foundation for both stages. For initial estimation, this feature facilitates robust registration between the live RGB-D view and the object's 3D mesh. For tracking, the same feature logic validates temporal correspondences, enabling a lightweight model to reliably regress the object's motion. Extensive experiments on benchmark datasets demonstrate that our integrated approach is both effective and robust, providing competitive pose estimation accuracy while maintaining high-fidelity tracking even through abrupt pose changes.