🤖 AI Summary
This work addresses the long-standing limitations of local feature matching—constrained by small-scale datasets and saturated benchmarks—in handling challenging real-world image pairs. We propose LoMa, a novel approach that integrates large-scale diverse data, advanced training strategies, high-capacity neural networks, and high-performance computing, trained with human-annotated ground-truth correspondences. To overcome existing evaluation shortcomings, we introduce HardMatch, a new benchmark comprising 1,000 difficult image pairs. LoMa significantly outperforms state-of-the-art methods across multiple benchmarks, including HardMatch, WxBS, InLoc, RUBIK, and IMC 2022, achieving gains of up to +29.5 mAA, thereby advancing the field toward more realistic and demanding scenarios.
📝 Abstract
Local feature matching has long been a fundamental component of 3D vision systems such as Structure-from-Motion (SfM), yet progress has lagged behind the rapid advances of modern data-driven approaches. The newer approaches, such as feed-forward reconstruction models, have benefited extensively from scaling dataset sizes, whereas local feature matching models are still only trained on a few mid-sized datasets. In this paper, we revisit local feature matching from a data-driven perspective. In our approach, which we call LoMa, we combine large and diverse data mixtures, modern training recipes, scaled model capacity, and scaled compute, resulting in remarkable gains in performance. Since current standard benchmarks mainly rely on collecting sparse views from successful 3D reconstructions, the evaluation of progress in feature matching has been limited to relatively easy image pairs. To address the resulting saturation of benchmarks, we collect 1000 highly challenging image pairs from internet data into a new dataset called HardMatch. Ground truth correspondences for HardMatch are obtained via manual annotation by the authors. In our extensive benchmarking suite, we find that LoMa makes outstanding progress across the board, outperforming the state-of-the-art method ALIKED+LightGlue by +18.6 mAA on HardMatch, +29.5 mAA on WxBS, +21.4 (1m, 10$^\circ$) on InLoc, +24.2 AUC on RUBIK, and +12.4 mAA on IMC 2022. We release our code and models publicly at https://github.com/davnords/LoMa.