🤖 AI Summary
Visual localization across aerial and ground platforms in dense urban environments suffers from low accuracy and a lack of standardized, large-scale cross-modal benchmarks.
Method: We introduce the first multi-city, cross-modal benchmark for image-to-point-cloud (I2P) matching, integrating ground-level mobile imagery with airborne LiDAR point clouds. To overcome the scarcity of reliable ground truth in large-scale urban settings, we propose a scalable, annotation-free ground-truth generation method. Furthermore, we design an end-to-end cross-modal matching framework incorporating robust visual feature extraction, point-cloud geometric registration, and geospatial alignment.
Contribution/Results: Experiments demonstrate that our approach significantly improves localization accuracy, robustness, and cross-domain generalization of I2P algorithms in complex urban scenes. The proposed benchmark and framework collectively address the critical gap in evaluation methodologies for heterogeneous aerial-ground platform localization.
📝 Abstract
Accurate visual localization in dense urban environments poses a fundamental task in photogrammetry, geospatial information science, and robotics. While imagery is a low-cost and widely accessible sensing modality, its effectiveness on visual odometry is often limited by textureless surfaces, severe viewpoint changes, and long-term drift. The growing public availability of airborne laser scanning (ALS) data opens new avenues for scalable and precise visual localization by leveraging ALS as a prior map. However, the potential of ALS-based localization remains underexplored due to three key limitations: (1) the lack of platform-diverse datasets, (2) the absence of reliable ground-truth generation methods applicable to large-scale urban environments, and (3) limited validation of existing Image-to-Point Cloud (I2P) algorithms under aerial-ground cross-platform settings. To overcome these challenges, we introduce a new large-scale dataset that integrates ground-level imagery from mobile mapping systems with ALS point clouds collected in Wuhan, Hong Kong, and San Francisco.