CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching

📅 2025-03-31

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Existing semi-dense image matching methods suffer from computational redundancy, attention interference from non-co-visible regions, and unilateral (target-view-only) subpixel refinement—limiting both localization accuracy and efficiency. To address these issues, we propose CoMatch: a Transformer-based dynamic co-visibility-aware matching framework. Its core contributions include: (i) the first co-visibility-guided token compression mechanism; (ii) a co-visibility-augmented attention mechanism; and (iii) the first bidirectional joint subpixel optimization of keypoints in both source and target views. CoMatch further integrates dynamic co-visibility estimation, adaptive token aggregation, selective attention suppression, and fine-grained correlation refinement. Evaluated on multiple benchmarks, CoMatch achieves significant improvements in matching accuracy, inference efficiency, and cross-scene generalization—particularly excelling in keypoint localization–sensitive tasks.

Technology Category

Application Category

📝 Abstract

This prospective study proposes CoMatch, a novel semi-dense image matcher with dynamic covisibility awareness and bilateral subpixel accuracy. Firstly, observing that modeling context interaction over the entire coarse feature map elicits highly redundant computation due to the neighboring representation similarity of tokens, a covisibility-guided token condenser is introduced to adaptively aggregate tokens in light of their covisibility scores that are dynamically estimated, thereby ensuring computational efficiency while improving the representational capacity of aggregated tokens simultaneously. Secondly, considering that feature interaction with massive non-covisible areas is distracting, which may degrade feature distinctiveness, a covisibility-assisted attention mechanism is deployed to selectively suppress irrelevant message broadcast from non-covisible reduced tokens, resulting in robust and compact attention to relevant rather than all ones. Thirdly, we find that at the fine-level stage, current methods adjust only the target view's keypoints to subpixel level, while those in the source view remain restricted at the coarse level and thus not informative enough, detrimental to keypoint location-sensitive usages. A simple yet potent fine correlation module is developed to refine the matching candidates in both source and target views to subpixel level, attaining attractive performance improvement. Thorough experimentation across an array of public benchmarks affirms CoMatch's promising accuracy, efficiency, and generalizability.

Problem

Research questions and friction points this paper is trying to address.

Reduces redundant computation in image matching via covisibility-guided token condensation

Enhances feature distinctiveness by suppressing non-covisible area interactions in attention

Improves subpixel accuracy by refining keypoints in both source and target views

Innovation

Methods, ideas, or system contributions that make the work stand out.

Covisibility-guided token condenser for efficient computation

Covisibility-assisted attention mechanism for robust feature interaction

Bilateral subpixel refinement in source and target views

🔎 Similar Papers

No similar papers found.