๐ค AI Summary
To address the challenges of high redundancy, low efficiency, and poor cross-resolution generalization in image feature matching, this paper proposes a dual-path semantic matching frameworkโMESA and DMESA. We introduce an implicit semantic region matching paradigm: leveraging SAM to generate semantic region segmentation, constructing an Area Graph to model inter-region relationships, and performing point-level matching exclusively within semantically consistent regions. Our approach integrates graph energy minimization, GMM-EM optimization, and patch-based matching. MESA achieves sparse and efficient matching, while DMESA enhances accuracy via dense region coverage. Evaluated on five indoor and outdoor datasets, both methods consistently outperform five state-of-the-art point-matching baselines. Notably, DMESA achieves nearly 5ร speedup over baselines with comparable accuracy, exhibits strong robustness to image resolution variations, and demonstrates significantly improved generalization capability.
๐ Abstract
We propose MESA and DMESA as novel feature matching methods, which utilize Segment Anything Model (SAM) to effectively mitigate matching redundancy. The key insight of our methods is to establish implicit-semantic area matching prior to point matching, based on advanced image understanding of SAM. Then, informative area matches with consistent internal semantic are able to undergo dense feature comparison, facilitating precise inside-area point matching. Specifically, MESA adopts a sparse matching framework and first obtains candidate areas from SAM results through a novel Area Graph (AG). Then, area matching among the candidates is formulated as graph energy minimization and solved by graphical models derived from AG. To address the efficiency issue of MESA, we further propose DMESA as its dense counterpart, applying a dense matching framework. After candidate areas are identified by AG, DMESA establishes area matches through generating dense matching distributions. The distributions are produced from off-the-shelf patch matching utilizing the Gaussian Mixture Model and refined via the Expectation Maximization. With less repetitive computation, DMESA showcases a speed improvement of nearly five times compared to MESA, while maintaining competitive accuracy. Our methods are extensively evaluated on five datasets encompassing indoor and outdoor scenes. The results illustrate consistent performance improvements from our methods for five distinct point matching baselines across all datasets. Furthermore, our methods exhibit promise generalization and improved robustness against image resolution variations. The code is publicly available at https://github.com/Easonyesheng/A2PM-MESA.