🤖 AI Summary
This work addresses the degradation in metric localization accuracy under cross-device scenarios caused by device heterogeneity and modality mismatch between map and query images. To tackle this challenge, the authors propose a dual-branch localization architecture that synergistically combines geometric and neural approaches: one branch integrates feature fusion with PnP solving, while the other leverages the MapAnything network for geometry-conditioned feedforward localization. The method introduces two key innovations—a neural-guided candidate map-frame pruning mechanism and a depth-conditioned optimization module—significantly enhancing both robustness and efficiency. Extensive experiments demonstrate state-of-the-art performance on the HYDRO and SUCCU benchmarks, with the approach achieving an outstanding score of 92.62% (R@0.5m, 5°) in the challenge evaluation.
📝 Abstract
We present a hybrid cross-device localization pipeline developed for the CroCoDL 2025 Challenge. Our approach integrates a shared retrieval encoder and two complementary localization branches: a classical geometric branch using feature fusion and PnP, and a neural feed-forward branch (MapAnything) for metric localization conditioned on geometric inputs. A neural-guided candidate pruning strategy further filters unreliable map frames based on translation consistency, while depth-conditioned localization refines metric scale and translation precision on Spot scenes. These components jointly lead to significant improvements in recall and accuracy across both HYDRO and SUCCU benchmarks. Our method achieved a final score of 92.62 (R@0.5m, 5{\deg}) during the challenge.