🤖 AI Summary
To address the poor generalizability of monocular camera localization and camera–LiDAR extrinsic calibration in dynamic environments—particularly across unseen sensors and scenes—this paper proposes a prior-free cross-modal matching framework. Methodologically, it pioneers modeling point-to-pixel correspondence as an optical flow estimation problem, enabling zero-shot transfer; integrates deep cross-modal feature matching, optical-flow-guided sparse correspondence generation, and geometrically constrained PnP pose estimation—eliminating reliance on retraining. Evaluated across six robotic platforms (including three public and three in-house datasets), the method achieves significantly higher accuracy in both camera localization within LiDAR maps and extrinsic parameter estimation compared to state-of-the-art approaches. It demonstrates strong cross-sensor and cross-scene generalization capability and practical deployability.
📝 Abstract
LiDARs are widely used for mapping and localization in dynamic environments. However, their high cost limits their widespread adoption. On the other hand, monocular localization in LiDAR maps using inexpensive cameras is a cost-effective alternative for large-scale deployment. Nevertheless, most existing approaches struggle to generalize to new sensor setups and environments, requiring retraining or fine-tuning. In this paper, we present CMRNext, a novel approach for camera-LIDAR matching that is independent of sensor-specific parameters, generalizable, and can be used in the wild for monocular localization in LiDAR maps and camera-LiDAR extrinsic calibration. CMRNext exploits recent advances in deep neural networks for matching cross-modal data and standard geometric techniques for robust pose estimation. We reformulate the point-pixel matching problem as an optical flow estimation problem and solve the Perspective-n-Point problem based on the resulting correspondences to find the relative pose between the camera and the LiDAR point cloud. We extensively evaluate CMRNext on six different robotic platforms, including three publicly available datasets and three in-house robots. Our experimental evaluations demonstrate that CMRNext outperforms existing approaches on both tasks and effectively generalizes to previously unseen environments and sensor setups in a zero-shot manner. We make the code and pre-trained models publicly available at http://cmrnext.cs.uni-freiburg.de .