UniCorrn: Unified Correspondence Transformer Across 2D and 3D

๐Ÿ“… 2026-05-05
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

213K/year
๐Ÿค– AI Summary
This work addresses the lack of a unified framework for visual correspondence tasks across 2Dโ€“2D, 2Dโ€“3D, and 3Dโ€“3D modalities, which existing methods handle separately. The authors propose UniCorrn, the first unified correspondence network capable of cross-modal geometric matching. UniCorrn employs a shared-weight architecture that integrates modality-specific backbones with a common encoderโ€“decoder structure and introduces a novel dual-stream decoder to disentangle appearance and geometric features. End-to-end query-based correspondence estimation is achieved through Transformer-based attention mechanisms. UniCorrn is the first method to unify multimodal geometric matching within a single model, achieving state-of-the-art performance with registration recall improvements of 8% on 7Scenes (2Dโ€“3D) and 10% on 3DLoMatch (3Dโ€“3D), while also delivering competitive results on 2Dโ€“2D tasks.
๐Ÿ“ Abstract
Visual correspondence across image-to-image (2D-2D), image-to-point cloud (2D-3D), and point cloud-to-point cloud (3D-3D) geometric matching forms the foundation for numerous 3D vision tasks. Despite sharing a similar problem structure, current methods use task-specific designs with separate models for each modality combination. We present UniCorrn, the first correspondence model with shared weights that unifies geometric matching across all three tasks. Our key insight is that Transformer attention naturally captures cross-modal feature similarity. We propose a dual-stream decoder that maintains separate appearance and positional feature streams. This design enables end-to-end learning through stack-able layers while supporting flexible query-based correspondence estimation across heterogeneous modalities. Our architecture employs modality-specific backbones followed by shared encoder and decoder components, trained jointly on diverse data combining pseudo point clouds from depth maps with real 3D correspondence annotations. UniCorrn achieves competitive performance on 2D-2D matching and surpasses prior state-of-the-art by 8% on 7Scenes (2D-3D) and 10% on 3DLoMatch (3D-3D) in registration recall. Project website: https://neu-vi.github.io/UniCorrn
Problem

Research questions and friction points this paper is trying to address.

visual correspondence
geometric matching
2D-3D alignment
3D vision
cross-modal matching
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified Correspondence
Cross-Modal Transformer
Dual-Stream Decoder
Geometric Matching
Heterogeneous Modalities
๐Ÿ”Ž Similar Papers
No similar papers found.