Learning Dense Feature Matching via Lifting Single 2D Image to 3D Space

📅 2025-06-30

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

Existing feature matching methods rely on scarce multi-view imagery and suffer from limited 3D correspondence modeling capability when using single-view 2D encoders, resulting in poor cross-domain generalization. To address this, we propose Lift to Match (L2M), a two-stage framework that learns 3D-aware feature matching from single-view images without multi-view supervision—the first of its kind. L2M lifts 2D features into 3D space and models the 3D feature field via differentiable Gaussian representations, enabling self-supervised matching learning through novel-view rendering. Trained solely on large-scale single-view image collections, L2M achieves state-of-the-art performance across multiple zero-shot matching benchmarks. It significantly improves robustness in complex scenes and enhances cross-domain generalization, demonstrating strong scalability and practical applicability in real-world vision tasks.

Technology Category

Application Category

📝 Abstract

Feature matching plays a fundamental role in many computer vision tasks, yet existing methods heavily rely on scarce and clean multi-view image collections, which constrains their generalization to diverse and challenging scenarios. Moreover, conventional feature encoders are typically trained on single-view 2D images, limiting their capacity to capture 3D-aware correspondences. In this paper, we propose a novel two-stage framework that lifts 2D images to 3D space, named as extbf{Lift to Match (L2M)}, taking full advantage of large-scale and diverse single-view images. To be specific, in the first stage, we learn a 3D-aware feature encoder using a combination of multi-view image synthesis and 3D feature Gaussian representation, which injects 3D geometry knowledge into the encoder. In the second stage, a novel-view rendering strategy, combined with large-scale synthetic data generation from single-view images, is employed to learn a feature decoder for robust feature matching, thus achieving generalization across diverse domains. Extensive experiments demonstrate that our method achieves superior generalization across zero-shot evaluation benchmarks, highlighting the effectiveness of the proposed framework for robust feature matching.

Problem

Research questions and friction points this paper is trying to address.

Overcoming reliance on scarce multi-view images for feature matching

Enhancing 2D feature encoders to capture 3D-aware correspondences

Achieving robust feature matching across diverse domains

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lifts 2D images to 3D space

Uses multi-view synthesis and 3D features

Novel-view rendering with synthetic data

🔎 Similar Papers

No similar papers found.