🤖 AI Summary
This work addresses the challenging problem of segment matching across wide-baseline images—particularly under extreme viewpoint variations (up to 180°), occlusion, and illumination changes. To this end, it is the first to introduce geometric inductive biases from 3D foundation models into segment matching. Methodologically, it integrates 3D spatial reasoning with SAM2’s segmentation priors to jointly optimize local feature matching, thereby establishing cross-image region correspondences that are both semantically coherent and geometrically consistent. The core contribution lies in explicitly modeling scene geometry via 3D representations, which substantially enhances matching robustness under large viewpoint shifts. On ScanNet++ and Replica benchmarks, the method achieves a 30% improvement in AUPRC over prior state-of-the-art approaches. Furthermore, it consistently improves downstream 3D instance segmentation and visual navigation performance.
📝 Abstract
Segment matching is an important intermediate task in computer vision that establishes correspondences between semantically or geometrically coherent regions across images. Unlike keypoint matching, which focuses on localized features, segment matching captures structured regions, offering greater robustness to occlusions, lighting variations, and viewpoint changes. In this paper, we leverage the spatial understanding of 3D foundation models to tackle wide-baseline segment matching, a challenging setting involving extreme viewpoint shifts. We propose an architecture that uses the inductive bias of these 3D foundation models to match segments across image pairs with up to 180 degree view-point change. Extensive experiments show that our approach outperforms state-of-the-art methods, including the SAM2 video propagator and local feature matching methods, by upto 30% on the AUPRC metric, on ScanNet++ and Replica datasets. We further demonstrate benefits of the proposed model on relevant downstream tasks, including 3D instance segmentation and image-goal navigation. Project Page: https://segmast3r.github.io/