🤖 AI Summary
To address the insufficient robustness of autonomous vehicle relocalization under GPS-denied conditions, this paper proposes a height-aware end-to-end BEV semantic relocalization framework. Methodologically, we introduce a novel multi-height-layer BEV feature modeling mechanism, extending the U-Net architecture to enable hierarchical height encoding. Furthermore, we pioneer the joint optimization of neural BEV segmentation and differentiable template matching, integrating semantic map (SD-map) embeddings to jointly achieve geometric awareness and semantic alignment. Evaluated on the nuScenes dataset, our method achieves a 4.11% improvement in BEV segmentation mIoU—outperforming transformer-based approaches with comparable computational cost by 1.7–2.8 percentage points. Moreover, relocalization recall accuracy improves by over 26%, significantly enhancing localization reliability and precision in GPS-absent scenarios.
📝 Abstract
Efficient relocalization is essential for intelligent vehicles when GPS reception is insufficient or sensor-based localization fails. Recent advances in Bird’s-Eye-View (BEV) segmentation allow for accurate estimation of local scene appearance and in turn, can benefit the relocalization of the vehicle. However, one downside of BEV methods is the heavy computation required to leverage the geometric constraints. This paper presents U-BEV, a U-Net inspired architecture that extends the current state-of-the-art by allowing the BEV to reason about the scene on multiple height layers before flattening the BEV features. We show that this extension boosts the performance of the U-BEV by up to 4.11 IoU. Additionally, we combine the encoded neural BEV with a differentiable template matcher to perform relocalization on neural SD-map data. The model is fully end-to-end trainable and outperforms transformer-based BEV methods of similar computational complexity by 1.7 to 2.8 mIoU and BEV-based relocalization by over 26% Recall Accuracy on the nuScenes dataset.