CVD-SfM: A Cross-View Deep Front-end Structure-from-Motion System for Sparse Localization in Multi-Altitude Scenes

📅 2025-08-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Robust, high-precision camera pose estimation from sparse imagery remains challenging in multi-altitude scenarios due to large viewpoint variations and limited geometric constraints. Method: This paper proposes a unified framework integrating cross-view Transformers, depth-guided feature matching, and structure-from-motion (SfM). It introduces cross-view Transformers into the SfM front-end for the first time to enhance semantic alignment across altitudes; designs a dual-scale feature matching strategy with depth-aware sparse optimization; and constructs two novel benchmark datasets specifically for multi-altitude pose estimation. Contribution/Results: Extensive experiments demonstrate significant improvements in both localization accuracy and robustness over state-of-the-art methods. The framework achieves superior performance under extreme viewpoint changes and sparse image conditions, validating its practical applicability in real-world applications such as UAV navigation, search-and-rescue operations, and autonomous infrastructure inspection.

Technology Category

Application Category

📝 Abstract
We present a novel multi-altitude camera pose estimation system, addressing the challenges of robust and accurate localization across varied altitudes when only considering sparse image input. The system effectively handles diverse environmental conditions and viewpoint variations by integrating the cross-view transformer, deep features, and structure-from-motion into a unified framework. To benchmark our method and foster further research, we introduce two newly collected datasets specifically tailored for multi-altitude camera pose estimation; datasets of this nature remain rare in the current literature. The proposed framework has been validated through extensive comparative analyses on these datasets, demonstrating that our system achieves superior performance in both accuracy and robustness for multi-altitude sparse pose estimation tasks compared to existing solutions, making it well suited for real-world robotic applications such as aerial navigation, search and rescue, and automated inspection.
Problem

Research questions and friction points this paper is trying to address.

Robust multi-altitude camera pose estimation from sparse images
Handling diverse conditions via cross-view transformer and deep features
Introducing new datasets for multi-altitude pose estimation benchmarking
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates cross-view transformer for diverse conditions
Combines deep features with structure-from-motion
Introduces new multi-altitude pose estimation datasets
🔎 Similar Papers
No similar papers found.
Y
Yaxuan Li
Stevens Institute of Technology, Hoboken, NJ, USA
Y
Yewei Huang
Stevens Institute of Technology, Hoboken, NJ, USA
Bijay Gaudel
Bijay Gaudel
Stevens Institute of Technology
RoboticsMachine LearningReinforcement LearningComputer VisionControl
H
Hamidreza Jafarnejadsani
Stevens Institute of Technology, Hoboken, NJ, USA
Brendan Englot
Brendan Englot
Anson Wood Burchard Endowed Professor, Stevens Institute of Technology
Autonomous NavigationRobot LearningMarine Robotics