SPOT: Scalable 3D Pre-training via Occupancy Prediction for Learning Transferable 3D Representations.

📅 2023-09-19
🏛️ IEEE Transactions on Pattern Analysis and Machine Intelligence
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address the high annotation cost and poor generalization of 3D LiDAR point cloud models, this paper proposes a scalable self-supervised pre-training paradigm based on occupancy prediction. Methodologically, we theoretically and empirically demonstrate for the first time that occupancy prediction learns generic, transferable 3D representations; we introduce LiDAR beam resampling augmentation and class-balancing strategies to mitigate sensor heterogeneity and label bias; and the framework supports large-scale pre-training on purely unlabeled data. Evaluated on nuScenes and Waymo, our approach yields substantial improvements on downstream 3D detection and segmentation tasks, exhibiting strong cross-task and cross-domain generalization. Moreover, performance consistently improves with increasing pre-training data volume, validating both scalability and practical utility.
📝 Abstract
Annotating 3D LiDAR point clouds for perception tasks is fundamental for many applications e.g. autonomous driving, yet it still remains notoriously labor-intensive. Pretraining-finetuning approach can alleviate the labeling burden by fine-tuning a pre-trained backbone across various downstream datasets as well as tasks. In this paper, we propose SPOT, namely Scalable Pre-training via Occupancy prediction for learning Transferable 3D representations under such a label-efficient fine-tuning paradigm. SPOT achieves effectiveness on various public datasets with different downstream tasks, showcasing its general representation power, cross-domain robustness and data scalability which are three key factors for real-world application. Specifically, we both theoretically and empirically show, for the first time, that general representations learning can be achieved through the task of occupancy prediction. Then, to address the domain gap caused by different LiDAR sensors and annotation methods, we develop a beam re-sampling technique for point cloud augmentation combined with class-balancing strategy. Furthermore, scalable pre-training is observed, that is, the downstream performance across all the experiments gets better with more pre-training data. Additionally, such pre-training strategy also remains compatible with unlabeled data. The hope is that our findings will facilitate the understanding of LiDAR points and pave the way for future advancements in LiDAR pre-training.
Problem

Research questions and friction points this paper is trying to address.

Reducing labor-intensive 3D LiDAR annotation for perception tasks
Learning transferable 3D representations via occupancy prediction
Addressing domain gaps in LiDAR sensors and annotation methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Occupancy prediction enables general 3D representation learning
Beam re-sampling and class-balancing reduce domain gaps
Scalable pre-training improves with more unlabeled data