SPOT: Scalable 3D Pre-training via Occupancy Prediction for Learning Transferable 3D Representations.

📅 2023-09-19

🏛️ IEEE Transactions on Pattern Analysis and Machine Intelligence

📈 Citations: 1

✨ Influential: 0

career value

186K/year

🤖 AI Summary

To address the high annotation cost and poor generalization of 3D LiDAR point cloud models, this paper proposes a scalable self-supervised pre-training paradigm based on occupancy prediction. Methodologically, we theoretically and empirically demonstrate for the first time that occupancy prediction learns generic, transferable 3D representations; we introduce LiDAR beam resampling augmentation and class-balancing strategies to mitigate sensor heterogeneity and label bias; and the framework supports large-scale pre-training on purely unlabeled data. Evaluated on nuScenes and Waymo, our approach yields substantial improvements on downstream 3D detection and segmentation tasks, exhibiting strong cross-task and cross-domain generalization. Moreover, performance consistently improves with increasing pre-training data volume, validating both scalability and practical utility.

📝 Abstract

Annotating 3D LiDAR point clouds for perception tasks is fundamental for many applications e.g. autonomous driving, yet it still remains notoriously labor-intensive. Pretraining-finetuning approach can alleviate the labeling burden by fine-tuning a pre-trained backbone across various downstream datasets as well as tasks. In this paper, we propose SPOT, namely Scalable Pre-training via Occupancy prediction for learning Transferable 3D representations under such a label-efficient fine-tuning paradigm. SPOT achieves effectiveness on various public datasets with different downstream tasks, showcasing its general representation power, cross-domain robustness and data scalability which are three key factors for real-world application. Specifically, we both theoretically and empirically show, for the first time, that general representations learning can be achieved through the task of occupancy prediction. Then, to address the domain gap caused by different LiDAR sensors and annotation methods, we develop a beam re-sampling technique for point cloud augmentation combined with class-balancing strategy. Furthermore, scalable pre-training is observed, that is, the downstream performance across all the experiments gets better with more pre-training data. Additionally, such pre-training strategy also remains compatible with unlabeled data. The hope is that our findings will facilitate the understanding of LiDAR points and pave the way for future advancements in LiDAR pre-training.

Problem

Research questions and friction points this paper is trying to address.

Reducing labor-intensive 3D LiDAR annotation for perception tasks

Learning transferable 3D representations via occupancy prediction

Addressing domain gaps in LiDAR sensors and annotation methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Occupancy prediction enables general 3D representation learning

Beam re-sampling and class-balancing reduce domain gaps

Scalable pre-training improves with more unlabeled data

🔎 Similar Papers

PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm