SegMASt3R: Geometry Grounded Segment Matching

📅 2025-10-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenging problem of segment matching across wide-baseline images—particularly under extreme viewpoint variations (up to 180°), occlusion, and illumination changes. To this end, it is the first to introduce geometric inductive biases from 3D foundation models into segment matching. Methodologically, it integrates 3D spatial reasoning with SAM2’s segmentation priors to jointly optimize local feature matching, thereby establishing cross-image region correspondences that are both semantically coherent and geometrically consistent. The core contribution lies in explicitly modeling scene geometry via 3D representations, which substantially enhances matching robustness under large viewpoint shifts. On ScanNet++ and Replica benchmarks, the method achieves a 30% improvement in AUPRC over prior state-of-the-art approaches. Furthermore, it consistently improves downstream 3D instance segmentation and visual navigation performance.

Technology Category

Application Category

📝 Abstract
Segment matching is an important intermediate task in computer vision that establishes correspondences between semantically or geometrically coherent regions across images. Unlike keypoint matching, which focuses on localized features, segment matching captures structured regions, offering greater robustness to occlusions, lighting variations, and viewpoint changes. In this paper, we leverage the spatial understanding of 3D foundation models to tackle wide-baseline segment matching, a challenging setting involving extreme viewpoint shifts. We propose an architecture that uses the inductive bias of these 3D foundation models to match segments across image pairs with up to 180 degree view-point change. Extensive experiments show that our approach outperforms state-of-the-art methods, including the SAM2 video propagator and local feature matching methods, by upto 30% on the AUPRC metric, on ScanNet++ and Replica datasets. We further demonstrate benefits of the proposed model on relevant downstream tasks, including 3D instance segmentation and image-goal navigation. Project Page: https://segmast3r.github.io/
Problem

Research questions and friction points this paper is trying to address.

Matching image segments under extreme viewpoint changes
Leveraging 3D foundation models for wide-baseline matching
Improving robustness over existing segmentation and matching methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses 3D foundation models for segment matching
Matches segments across 180-degree viewpoint changes
Outperforms SAM2 and local feature methods
Rohit Jayanti
Rohit Jayanti
Graduate Researcher, IIIT-Hyderabad
Visual SLAMStructure-from-Motion3D Scene Understanding
S
Swayam Agrawal
IIIT Hyderabad
V
Vansh Garg
IIIT Hyderabad
S
Siddharth Tourani
University of Heidelberg
M
Muhammad Haris Khan
MBZUAI
Sourav Garg
Sourav Garg
(former) Research Fellow, Uni. Adelaide
RoboticsComputer VisionDeep Learning
M
Madhava Krishna
IIIT Hyderabad