Temporal Overlapping Prediction: A Self-supervised Pre-training Method for LiDAR Moving Object Segmentation

📅 2025-03-10
📈 Citations: 0
Influential: 0
📄 PDF

career value

186K/year
🤖 AI Summary
This work addresses the heavy reliance on costly manual annotations and poor generalization in LiDAR point cloud moving object segmentation (MOS). We propose TOP, a self-supervised pretraining framework centered on a novel temporal overlap point prediction mechanism: leveraging motion consistency across adjacent frames to predict occupancy states of dynamic points, jointly optimized with current-frame occupancy reconstruction for fully unsupervised learning. To enable fair evaluation—particularly for small-sized and distant objects—we introduce the mIoU_obj metric, which mitigates bias from point-count imbalance. On nuScenes and SemanticKITTI, TOP achieves up to a 28.77% improvement over supervised baselines. Moreover, it significantly enhances cross-LiDAR configuration transferability and generalization to downstream tasks, demonstrating robustness beyond domain-specific supervision.

Technology Category

Application Category

📝 Abstract
Moving object segmentation (MOS) on LiDAR point clouds is crucial for autonomous systems like self-driving vehicles. Previous supervised approaches rely heavily on costly manual annotations, while LiDAR sequences naturally capture temporal motion cues that can be leveraged for self-supervised learning. In this paper, we propose extbf{T}emporal extbf{O}verlapping extbf{P}rediction ( extbf{TOP}), a self-supervised pre-training method that alleviate the labeling burden for MOS. extbf{TOP} explores the temporal overlapping points that commonly observed by current and adjacent scans, and learns spatiotemporal representations by predicting the occupancy states of temporal overlapping points. Moreover, we utilize current occupancy reconstruction as an auxiliary pre-training objective, which enhances the current structural awareness of the model. We conduct extensive experiments and observe that the conventional metric Intersection-over-Union (IoU) shows strong bias to objects with more scanned points, which might neglect small or distant objects. To compensate for this bias, we introduce an additional metric called $ ext{mIoU}_{ ext{obj}}$ to evaluate object-level performance. Experiments on nuScenes and SemanticKITTI show that extbf{TOP} outperforms both supervised training-from-scratch baseline and other self-supervised pre-training baselines by up to 28.77% relative improvement, demonstrating strong transferability across LiDAR setups and generalization to other tasks. Code and pre-trained models will be publicly available upon publication.
Problem

Research questions and friction points this paper is trying to address.

Reduces manual annotation costs for LiDAR moving object segmentation.
Leverages temporal motion cues for self-supervised learning in LiDAR sequences.
Introduces a new metric to address bias in object-level performance evaluation.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised pre-training for LiDAR MOS
Temporal Overlapping Prediction (TOP) method
Auxiliary current occupancy reconstruction objective
🔎 Similar Papers
No similar papers found.