CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching

šŸ“… 2025-03-31
šŸ“ˆ Citations: 0
✨ Influential: 0
šŸ“„ PDF
šŸ¤– AI Summary
Existing semi-dense image matching methods suffer from computational redundancy, attention interference from non-co-visible regions, and unilateral (target-view-only) subpixel refinement—limiting both localization accuracy and efficiency. To address these issues, we propose CoMatch: a Transformer-based dynamic co-visibility-aware matching framework. Its core contributions include: (i) the first co-visibility-guided token compression mechanism; (ii) a co-visibility-augmented attention mechanism; and (iii) the first bidirectional joint subpixel optimization of keypoints in both source and target views. CoMatch further integrates dynamic co-visibility estimation, adaptive token aggregation, selective attention suppression, and fine-grained correlation refinement. Evaluated on multiple benchmarks, CoMatch achieves significant improvements in matching accuracy, inference efficiency, and cross-scene generalization—particularly excelling in keypoint localization–sensitive tasks.

Technology Category

Application Category

šŸ“ Abstract
This prospective study proposes CoMatch, a novel semi-dense image matcher with dynamic covisibility awareness and bilateral subpixel accuracy. Firstly, observing that modeling context interaction over the entire coarse feature map elicits highly redundant computation due to the neighboring representation similarity of tokens, a covisibility-guided token condenser is introduced to adaptively aggregate tokens in light of their covisibility scores that are dynamically estimated, thereby ensuring computational efficiency while improving the representational capacity of aggregated tokens simultaneously. Secondly, considering that feature interaction with massive non-covisible areas is distracting, which may degrade feature distinctiveness, a covisibility-assisted attention mechanism is deployed to selectively suppress irrelevant message broadcast from non-covisible reduced tokens, resulting in robust and compact attention to relevant rather than all ones. Thirdly, we find that at the fine-level stage, current methods adjust only the target view's keypoints to subpixel level, while those in the source view remain restricted at the coarse level and thus not informative enough, detrimental to keypoint location-sensitive usages. A simple yet potent fine correlation module is developed to refine the matching candidates in both source and target views to subpixel level, attaining attractive performance improvement. Thorough experimentation across an array of public benchmarks affirms CoMatch's promising accuracy, efficiency, and generalizability.
Problem

Research questions and friction points this paper is trying to address.

Reduces redundant computation in image matching via covisibility-guided token condensation
Enhances feature distinctiveness by suppressing non-covisible area interactions in attention
Improves subpixel accuracy by refining keypoints in both source and target views
Innovation

Methods, ideas, or system contributions that make the work stand out.

Covisibility-guided token condenser for efficient computation
Covisibility-assisted attention mechanism for robust feature interaction
Bilateral subpixel refinement in source and target views
šŸ”Ž Similar Papers
No similar papers found.
Zizhuo Li
Zizhuo Li
Wuhan University
Computer VisionImage MatchingMulti-View Geometry
Y
Yifan Lu
Wuhan University, China
L
Linfeng Tang
Wuhan University, China
S
Shihuai Zhang
Wuhan University, China
Jiayi Ma
Jiayi Ma
Wuhan University
Computer VisionImage FusionImage Matching