RoMa v2: Harder Better Faster Denser Feature Matching

📅 2025-11-19

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Existing dense feature matching methods suffer from low accuracy and poor robustness in complex real-world scenarios, while high-accuracy models incur prohibitive inference latency and memory overhead. To address these limitations, we propose an efficient and robust dense matching framework. First, we leverage DINOv3 to extract unbiased, generalizable feature representations. Second, we design a decoupled two-stage architecture comprising coarse-level matching followed by differentiable refinement, augmented with a novel matching loss function. Third, we develop lightweight, custom CUDA kernels to significantly reduce GPU memory consumption and accelerate inference. Evaluated on multiple standard benchmarks, our method achieves state-of-the-art performance by striking an optimal trade-off among matching accuracy, inference speed, and memory efficiency. The source code is publicly available.

Technology Category

Application Category

📝 Abstract

Dense feature matching aims to estimate all correspondences between two images of a 3D scene and has recently been established as the gold-standard due to its high accuracy and robustness. However, existing dense matchers still fail or perform poorly for many hard real-world scenarios, and high-precision models are often slow, limiting their applicability. In this paper, we attack these weaknesses on a wide front through a series of systematic improvements that together yield a significantly better model. In particular, we construct a novel matching architecture and loss, which, combined with a curated diverse training distribution, enables our model to solve many complex matching tasks. We further make training faster through a decoupled two-stage matching-then-refinement pipeline, and at the same time, significantly reduce refinement memory usage through a custom CUDA kernel. Finally, we leverage the recent DINOv3 foundation model along with multiple other insights to make the model more robust and unbiased. In our extensive set of experiments we show that the resulting novel matcher sets a new state-of-the-art, being significantly more accurate than its predecessors. Code is available at https://github.com/Parskatt/romav2

Problem

Research questions and friction points this paper is trying to address.

Addresses dense feature matching failures in hard real-world scenarios

Improves speed and reduces memory usage in high-precision matching models

Enhances robustness and reduces bias in image correspondence estimation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel matching architecture and loss function

Decoupled two-stage matching-then-refinement pipeline

Custom CUDA kernel for reduced memory usage

🔎 Similar Papers

Relational Representation Learning Network for Cross-Spectral Image Patch Matching