🤖 AI Summary
Existing semi-dense feature matching methods rely on global search, suffering from compromised accuracy and efficiency at high resolutions. To address this, we propose CasP, a novel cascaded correspondence prior-guided framework that decouples matching into two stages: (1) coarse-grained prior-based candidate region localization, and (2) fine-grained, region-selective cross-attention for precise one-to-one correspondence estimation. CasP further fuses multi-level features—integrating high-level semantic cues with low-level geometric details—to enhance discriminability and drastically reduce the search space. This design achieves substantial computational savings: at 1152-pixel resolution, CasP accelerates matching by ≈2.2× over ELoFTR while improving geometric registration accuracy and cross-domain generalization. The method is particularly suited for real-time, high-resolution vision systems such as SLAM and UAV navigation.
📝 Abstract
Semi-dense feature matching methods have shown strong performance in challenging scenarios. However, the existing pipeline relies on a global search across the entire feature map to establish coarse matches, limiting further improvements in accuracy and efficiency. Motivated by this limitation, we propose a novel pipeline, CasP, which leverages cascaded correspondence priors for guidance. Specifically, the matching stage is decomposed into two progressive phases, bridged by a region-based selective cross-attention mechanism designed to enhance feature discriminability. In the second phase, one-to-one matches are determined by restricting the search range to the one-to-many prior areas identified in the first phase. Additionally, this pipeline benefits from incorporating high-level features, which helps reduce the computational costs of low-level feature extraction. The acceleration gains of CasP increase with higher resolution, and our lite model achieves a speedup of $sim2.2 imes$ at a resolution of 1152 compared to the most efficient method, ELoFTR. Furthermore, extensive experiments demonstrate its superiority in geometric estimation, particularly with impressive cross-domain generalization. These advantages highlight its potential for latency-sensitive and high-robustness applications, such as SLAM and UAV systems. Code is available at https://github.com/pq-chen/CasP.