🤖 AI Summary
This work addresses the significant performance degradation of existing deep image matching methods under large in-plane rotations. Through systematic investigation of where to best incorporate rotation invariance within sparse feature matching pipelines, extensive training, and multi-benchmark evaluation, the study demonstrates that introducing rotation invariance solely at the descriptor stage achieves robustness comparable to that of rotation-invariant matchers while being more computationally efficient. Moreover, it shows that, with sufficient training data, rotation invariance does not compromise general matching performance and highlights the critical role of data scale in enabling robust rotation generalization. The released models achieve state-of-the-art results on benchmarks including WxBS, HardMatch, and SatAst, substantially improving matching robustness across multimodal, extreme-viewpoint, and satellite imagery scenarios.
📝 Abstract
Finding matching keypoints between images is a core problem in 3D computer vision. However, modern matchers struggle with large in-plane rotations. A straightforward mitigation is to learn rotation invariance via data augmentation. However, it remains unclear at which stage rotation invariance should be incorporated. In this paper, we study this in the context of a modern sparse matching pipeline. We perform extensive experiments by training on a large collection of 3D vision datasets and evaluating on popular image matching benchmarks. Surprisingly, we find that incorporating rotation invariance already in the descriptor yields similar performance to handling it in the matcher. However, rotation invariance is achieved earlier in the matcher when it is learned in the descriptor, allowing for a faster rotation-invariant matcher. Further, we find that enforcing rotation invariance does not hurt upright performance when trained at scale. Finally, we study the emergence of rotation invariance through scale and find that increasing the training data size substantially improves generalization to rotated images. We release two matchers robust to in-plane rotations that achieve state-of-the-art performance on e.g. multi-modal (WxBS), extreme (HardMatch), and satellite image matching (SatAst). Code is available at https://github.com/davnords/loma.