PAPI-Reg: Patch-to-Pixel Solution for Efficient Cross-Modal Registration between LiDAR Point Cloud and Camera Image

📅 2025-03-19

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

To address low accuracy, poor real-time performance, and reliance on calibration targets or hand-crafted environmental features in LiDAR point cloud–camera image cross-modal registration, this paper proposes an unsupervised, end-to-end multi-view projection registration framework. Our method introduces two key innovations: (1) a novel patch-to-pixel differentiable attention matching mechanism that enables fine-grained cross-modal correspondence; and (2) a multi-scale CNN architecture jointly modeling multi-view 2D projections of point clouds and image features, significantly enhancing robustness under small overlap conditions. The framework requires no calibration boards or manual feature engineering, ensuring strong generalizability. Evaluated on KITTI, it achieves a registration accuracy of 99.2% with real-time inference capability. On nuScenes, it surpasses state-of-the-art methods in both accuracy and speed.

Technology Category

Application Category

📝 Abstract

The primary requirement for cross-modal data fusion is the precise alignment of data from different sensors. However, the calibration between LiDAR point clouds and camera images is typically time-consuming and needs external calibration board or specific environmental features. Cross-modal registration effectively solves this problem by aligning the data directly without requiring external calibration. However, due to the domain gap between the point cloud and the image, existing methods rarely achieve satisfactory registration accuracy while maintaining real-time performance. To address this issue, we propose a framework that projects point clouds into several 2D representations for matching with camera images, which not only leverages the geometric characteristic of LiDAR point clouds more effectively but also bridge the domain gap between the point cloud and image. Moreover, to tackle the challenges of cross modal differences and the limited overlap between LiDAR point clouds and images in the image matching task, we introduce a multi-scale feature extraction network to effectively extract features from both camera images and the projection maps of LiDAR point cloud. Additionally, we propose a patch-to-pixel matching network to provide more effective supervision and achieve higher accuracy. We validate the performance of our model through experiments on the KITTI and nuScenes datasets. Our network achieves real-time performance and extremely high registration accuracy. On the KITTI dataset, our model achieves a registration accuracy rate of over 99%.

Problem

Research questions and friction points this paper is trying to address.

Efficient alignment of LiDAR and camera data

Real-time cross-modal registration without calibration

High accuracy in LiDAR-camera image matching

Innovation

Methods, ideas, or system contributions that make the work stand out.

Projects LiDAR point clouds into 2D representations

Introduces multi-scale feature extraction network

Proposes patch-to-pixel matching network for accuracy

🔎 Similar Papers

Deep Learning-Based Point Cloud Registration: A Comprehensive Survey and Taxonomy