MV-RoMa: From Pairwise Matching into Multi-View Track Reconstruction

📅 2026-03-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing pairwise image matching methods, which often yield fragmented and geometrically inconsistent correspondences in multi-view settings, hindering high-quality 3D reconstruction. To overcome this, we propose MV-RoMa, a geometrically consistent dense multi-view matching model that jointly estimates dense correspondences from a source image to multiple co-visible images, enabling coherent multi-view trajectory reconstruction. Our approach introduces a novel multi-view encoder and a pixel-level attention refinement module, which leverages pairwise matches as geometric priors to enhance consistency while avoiding the high computational cost of full cross-attention. Additionally, we design an SfM-oriented trajectory integration strategy for post-processing. Experiments demonstrate that MV-RoMa significantly outperforms current methods on multiple challenging benchmarks, producing denser, more reliable correspondences and thereby enabling more accurate 3D reconstruction.
📝 Abstract
Establishing consistent correspondences across images is essential for 3D vision tasks such as structure-from-motion (SfM), yet most existing matchers operate in a pairwise manner, often producing fragmented and geometrically inconsistent tracks when their predictions are chained across views. We propose MV-RoMa, a multi-view dense matching model that jointly estimates dense correspondences from a source image to multiple co-visible targets. Specifically, we design an efficient model architecture which avoids high computational cost of full cross-attention for multi-view feature interaction: (i) multi-view encoder that leverages pair-wise matching results as a geometric prior, and (ii) multi-view matching refiner that refines correspondences using pixel-wise attention. Additionally, we propose a post-processing strategy that integrates our model's consistent multi-view correspondences as high-quality tracks for SfM. Across diverse and challenging benchmarks, MV-RoMa produces more reliable correspondences and substantially denser, more accurate 3D reconstructions than existing sparse and dense matching methods. Project page: https://icetea-cv.github.io/mv-roma/.
Problem

Research questions and friction points this paper is trying to address.

multi-view matching
geometric consistency
3D reconstruction
structure-from-motion
correspondence fragmentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-view matching
dense correspondence
structure-from-motion
geometric consistency
pixel-wise attention
🔎 Similar Papers
No similar papers found.
J
Jongmin Lee
KAIST
S
Seungyeop Kang
Seoul National University
Sungjoo Yoo
Sungjoo Yoo
Seoul National University
memorystorage