🤖 AI Summary
This work addresses the challenge of LiDAR–camera extrinsic calibration, which typically relies on manual calibration targets or specific static scenes and is thus difficult to deploy online. To overcome this limitation, we propose the first self-supervised extrinsic calibration network that operates without any calibration targets. Our method employs a dual-path architecture with differential feature maps, replacing conventional two-branch designs to enhance cross-modal feature association while reducing model complexity. Furthermore, we introduce a depth map–based multi-view camera augmentation strategy to improve generalization and enable online adaptive calibration. Extensive experiments on five public benchmarks and our own collected dataset demonstrate that the proposed approach significantly outperforms existing methods, achieving state-of-the-art performance in both calibration accuracy and generalization capability.
📝 Abstract
LiDAR-camera extrinsic calibration is essential for multi-modal data fusion in robotic perception systems. However, existing approaches typically rely on handcrafted calibration targets (e.g., checkerboards) or specific, static scene types, limiting their adaptability and deployment in real-world autonomous and robotic applications. This article presents the first self-supervised LiDAR-camera extrinsic calibration network that operates in an online fashion and eliminates the need for specific calibration targets. We first identify a significant generalization degradation problem in prior methods, caused by the conventional single-sided data augmentation strategy. To overcome this limitation, we propose a novel double-sided data augmentation technique that generates multi-perspective camera views using estimated depth maps, thereby enhancing robustness and diversity during training. Built upon this augmentation strategy, we design a dual-path, self-supervised calibration framework that reduces the dependence on high-precision ground truth labels and supports fully adaptive online calibration. Furthermore, to improve cross-modal feature association, we replace the traditional dual-branch feature extraction design with a difference map construction process that explicitly correlates LiDAR and camera features. This not only enhances calibration accuracy but also reduces model complexity. Extensive experiments conducted on five public benchmark datasets, as well as our own recorded dataset, demonstrate that the proposed method significantly outperforms existing approaches in terms of generalizability.