Deep Learning Reforms Image Matching: A Survey and Outlook

📅 2025-06-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Image matching faces challenges in robustness and accuracy under complex scenes, particularly for visual localization and 3D reconstruction. To address this, we systematically reformulate the conventional multi-stage pipeline and propose the first dual-dimensional taxonomy—aligned with the detection-description-matching-geometric-estimation workflow—to uniformly evaluate twelve deep matching paradigms across pose estimation, homography estimation, and visual localization. Our method integrates differentiable geometric solvers, end-to-end trainable architectures, contrastive/self-supervised feature learning, and robust optimization modules into a standardized benchmark. Extensive experiments reveal fundamental trade-offs among sparse, semi-dense, and dense matching strategies—as well as pose regression paradigms—in terms of accuracy, robustness, and efficiency. The study identifies key open challenges and delineates principled directions for next-generation matching frameworks.

Technology Category

Application Category

📝 Abstract
Image matching, which establishes correspondences between two-view images to recover 3D structure and camera geometry, serves as a cornerstone in computer vision and underpins a wide range of applications, including visual localization, 3D reconstruction, and simultaneous localization and mapping (SLAM). Traditional pipelines composed of ``detector-descriptor, feature matcher, outlier filter, and geometric estimator'' falter in challenging scenarios. Recent deep-learning advances have significantly boosted both robustness and accuracy. This survey adopts a unique perspective by comprehensively reviewing how deep learning has incrementally transformed the classical image matching pipeline. Our taxonomy highly aligns with the traditional pipeline in two key aspects: i) the replacement of individual steps in the traditional pipeline with learnable alternatives, including learnable detector-descriptor, outlier filter, and geometric estimator; and ii) the merging of multiple steps into end-to-end learnable modules, encompassing middle-end sparse matcher, end-to-end semi-dense/dense matcher, and pose regressor. We first examine the design principles, advantages, and limitations of both aspects, and then benchmark representative methods on relative pose recovery, homography estimation, and visual localization tasks. Finally, we discuss open challenges and outline promising directions for future research. By systematically categorizing and evaluating deep learning-driven strategies, this survey offers a clear overview of the evolving image matching landscape and highlights key avenues for further innovation.
Problem

Research questions and friction points this paper is trying to address.

Deep learning enhances image matching robustness and accuracy
Replacing traditional pipeline steps with learnable alternatives
Merging multiple steps into end-to-end learnable modules
Innovation

Methods, ideas, or system contributions that make the work stand out.

Replaces traditional steps with learnable components
Merges multiple steps into end-to-end modules
Benchmarks methods on pose and homography tasks
🔎 Similar Papers
No similar papers found.
S
Shihua Zhang
Electronic Information School, Wuhan University, Wuhan, 430072, China
Zizhuo Li
Zizhuo Li
Wuhan University
Computer VisionImage MatchingMulti-View Geometry
Kaining Zhang
Kaining Zhang
Wuhan University
computer visionimage matching3d vision
Y
Yifan Lu
Electronic Information School, Wuhan University, Wuhan, 430072, China
Y
Yuxin Deng
Electronic Information School, Wuhan University, Wuhan, 430072, China
L
Linfeng Tang
Electronic Information School, Wuhan University, Wuhan, 430072, China
Xingyu Jiang
Xingyu Jiang
Huazhong University of Science and Technology
Computer VisionMultimodal Learning3D Vision
Jiayi Ma
Jiayi Ma
Wuhan University
Computer VisionImage FusionImage Matching