🤖 AI Summary
To address unreliable UAV landing localization caused by GPS failure in urban environments, this paper proposes a cross-modal, high-precision visual localization method based on event cameras. The method introduces a spatiotemporal feature-guided pose estimation module and a motion-aware hierarchical fusion optimization mechanism. Crucially, it innovatively employs a temporal distance field (TDF) to enable efficient matching between asynchronous event streams and 3D point cloud maps. Motion priors are embedded into both event filtering and pose optimization to suppress noise and enhance robustness. Experimental results demonstrate an accuracy of 1.34° for rotational error and 6.9 mm for translational error, with a tracking latency of only 10.08 ms. Compared to baseline methods, the proposed approach improves both accuracy and efficiency by over 50%, significantly enhancing the reliability and real-time performance of autonomous, precise UAV landing in complex urban scenes.
📝 Abstract
After years of growth, drone-based delivery is transforming logistics. At its core, real-time 6-DoF drone pose tracking enables precise flight control and accurate drone landing. With the widespread availability of urban 3D maps, the Visual Positioning Service (VPS), a mobile pose estimation system, has been adapted to enhance drone pose tracking during the landing phase, as conventional systems like GPS are unreliable in urban environments due to signal attenuation and multi-path propagation. However, deploying the current VPS on drones faces limitations in both estimation accuracy and efficiency. In this work, we redesign drone-oriented VPS with the event camera and introduce EV-Pose to enable accurate, high-frequency 6-DoF pose tracking for accurate drone landing. EV-Pose introduces a spatio-temporal feature-instructed pose estimation module that extracts a temporal distance field to enable 3D point map matching for pose estimation; and a motion-aware hierarchical fusion and optimization scheme to enhance the above estimation in accuracy and efficiency, by utilizing drone motion in the extit{early stage} of event filtering and the extit{later stage} of pose optimization. Evaluation shows that EV-Pose achieves a rotation accuracy of 1.34$degree$ and a translation accuracy of 6.9$mm$ with a tracking latency of 10.08$ms$, outperforming baselines by $>$50%, mcrevise{thus enabling accurate drone landings.} Demo: https://ev-pose.github.io/