🤖 AI Summary
To address the need for lightweight, robust, and real-time LiDAR-based global localization of unmanned ground vehicles in complex environments, this paper proposes a bird’s-eye-view (BEV)-oriented end-to-end LiDAR localization method. The core innovation lies in jointly modeling local feature rotational equivariance and global descriptor rotational invariance via a Rotational Equivariance Module (REM) and a Rotational Equivariance-Invariance Network (REIN), trained solely with place-level weak supervision—no ground-truth poses required. The method integrates a lightweight CNN backbone, BEV LiDAR representation, NetVLAD, and end-to-end optimization. It achieves state-of-the-art performance across multiple benchmarks (e.g., KITTI), demonstrates strong generalization across days, years, and sensor configurations, operates at real-time inference speed, and requires only 3,000 KITTI place-level labels for training.
📝 Abstract
This article introduces BEVPlace++, a novel, fast, and robust LiDAR global localization method for unmanned ground vehicles. It uses lightweight convolutional neural networks (CNNs) on Bird's Eye View (BEV) image-like representations of LiDAR data to achieve accurate global localization through place recognition followed by 3-DoF pose estimation. Our detailed analyses reveal an interesting fact that CNNs are inherently effective at extracting distinctive features from LiDAR BEV images. Remarkably, keypoints of two BEV images with large translations can be effectively matched using CNN-extracted features. Building on this insight, we design a rotation equivariant module (REM) to obtain distinctive features while enhancing robustness to rotational changes. A Rotation Equivariant and Invariant Network (REIN) is then developed by cascading REM and a descriptor generator, NetVLAD, to sequentially generate rotation equivariant local features and rotation invariant global descriptors. The global descriptors are used first to achieve robust place recognition, and the local features are used for accurate pose estimation. Experimental results on multiple public datasets demonstrate that BEVPlace++, even when trained on a small dataset (3000 frames of KITTI) only with place labels, generalizes well to unseen environments, performs consistently across different days and years, and adapts to various types of LiDAR scanners. BEVPlace++ achieves state-of-the-art performance in subtasks of global localization including place recognition, loop closure detection, and global localization. Additionally, BEVPlace++ is lightweight, runs in real-time, and does not require accurate pose supervision, making it highly convenient for deployment. The source codes are publicly available at https://github.com/zjuluolun/BEVPlace.