ETO:Efficient Transformer-based Local Feature Matching by Organizing Multiple Homography Hypotheses

📅 2024-10-30
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the longstanding trade-off between efficiency and accuracy in local feature matching. We propose an efficient Transformer architecture that introduces a novel multi-homography hypothesis modeling mechanism to explicitly capture continuous correspondence relationships between images, coupled with a lightweight unidirectional cross-attention module that significantly reduces computational overhead. The resulting framework enables end-to-end learnable dense matching, achieving both high accuracy and substantially accelerated inference. On YFCC100M, our method matches LoFTR’s matching accuracy while running four times faster; robustness and generalization are further validated on MegaDepth, ScanNet, and HPatches. Our core contribution lies in the synergistic design of multi-homography modeling and unidirectional cross-attention, establishing a new paradigm for efficient, high-fidelity local feature matching.

Technology Category

Application Category

📝 Abstract
We tackle the efficiency problem of learning local feature matching. Recent advancements have given rise to purely CNN-based and transformer-based approaches, each augmented with deep learning techniques. While CNN-based methods often excel in matching speed, transformer-based methods tend to provide more accurate matches. We propose an efficient transformer-based network architecture for local feature matching. This technique is built on constructing multiple homography hypotheses to approximate the continuous correspondence in the real world and uni-directional cross-attention to accelerate the refinement. On the YFCC100M dataset, our matching accuracy is competitive with LoFTR, a state-of-the-art transformer-based architecture, while the inference speed is boosted to 4 times, even outperforming the CNN-based methods. Comprehensive evaluations on other open datasets such as Megadepth, ScanNet, and HPatches demonstrate our method's efficacy, highlighting its potential to significantly enhance a wide array of downstream applications.
Problem

Research questions and friction points this paper is trying to address.

Image Matching
Local Feature
Similarity Search
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based_method
Feature_matching
CNN-Transformer_integration
🔎 Similar Papers
No similar papers found.
J
Junjie Ni
State Key Lab of CAD&CG, Zhejiang University
G
Guofeng Zhang
State Key Lab of CAD&CG, Zhejiang University
G
Guanglin Li
State Key Lab of CAD&CG, Zhejiang University
Yijin Li
Yijin Li
State Key Lab of CAD&CG, Zhejiang University, China
Computer Vision
X
Xinyang Liu
State Key Lab of CAD&CG, Zhejiang University
Zhaoyang Huang
Zhaoyang Huang
Chinese University of Hong Kong
computer vision
H
Hujun Bao
State Key Lab of CAD&CG, Zhejiang University