Temporal Overlapping Prediction: A Self-supervised Pre-training Method for LiDAR Moving Object Segmentation

📅 2025-03-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the heavy reliance on costly manual annotations and poor generalization in LiDAR point cloud moving object segmentation (MOS). We propose TOP, a self-supervised pretraining framework centered on a novel temporal overlap point prediction mechanism: leveraging motion consistency across adjacent frames to predict occupancy states of dynamic points, jointly optimized with current-frame occupancy reconstruction for fully unsupervised learning. To enable fair evaluation—particularly for small-sized and distant objects—we introduce the mIoU_obj metric, which mitigates bias from point-count imbalance. On nuScenes and SemanticKITTI, TOP achieves up to a 28.77% improvement over supervised baselines. Moreover, it significantly enhances cross-LiDAR configuration transferability and generalization to downstream tasks, demonstrating robustness beyond domain-specific supervision.

Technology Category

Application Category

📝 Abstract
Moving object segmentation (MOS) on LiDAR point clouds is crucial for autonomous systems like self-driving vehicles. Previous supervised approaches rely heavily on costly manual annotations, while LiDAR sequences naturally capture temporal motion cues that can be leveraged for self-supervised learning. In this paper, we propose extbf{T}emporal extbf{O}verlapping extbf{P}rediction ( extbf{TOP}), a self-supervised pre-training method that alleviate the labeling burden for MOS. extbf{TOP} explores the temporal overlapping points that commonly observed by current and adjacent scans, and learns spatiotemporal representations by predicting the occupancy states of temporal overlapping points. Moreover, we utilize current occupancy reconstruction as an auxiliary pre-training objective, which enhances the current structural awareness of the model. We conduct extensive experiments and observe that the conventional metric Intersection-over-Union (IoU) shows strong bias to objects with more scanned points, which might neglect small or distant objects. To compensate for this bias, we introduce an additional metric called $ ext{mIoU}_{ ext{obj}}$ to evaluate object-level performance. Experiments on nuScenes and SemanticKITTI show that extbf{TOP} outperforms both supervised training-from-scratch baseline and other self-supervised pre-training baselines by up to 28.77% relative improvement, demonstrating strong transferability across LiDAR setups and generalization to other tasks. Code and pre-trained models will be publicly available upon publication.
Problem

Research questions and friction points this paper is trying to address.

Reduces manual annotation costs for LiDAR moving object segmentation.
Leverages temporal motion cues for self-supervised learning in LiDAR sequences.
Introduces a new metric to address bias in object-level performance evaluation.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised pre-training for LiDAR MOS
Temporal Overlapping Prediction (TOP) method
Auxiliary current occupancy reconstruction objective
🔎 Similar Papers
No similar papers found.
Z
Ziliang Miao
The University of Hong Kong
Runjian Chen
Runjian Chen
The University of Hong Kong ; MMLAB@HKU
Unsupervised 3D Scene Understanding
Yixi Cai
Yixi Cai
Postdoctoral Fellow, Division of Robotics, Perception and Learning, KTH
RoboticsLiDARMapping
B
Buwei He
KTH Royal Institute of Technology
W
Wenquan Zhao
Southern University of Science and Technology
Wenqi Shao
Wenqi Shao
Researcher at Shanghai AI Laboratory
Foundation Model EvaluationLLM CompressionEfficient AdaptationMultimodal Learning
B
Bo Zhang
Shanghai AI Laboratory
F
Fu Zhang
The University of Hong Kong