LOSC: LiDAR Open-voc Segmentation Consolidator

📅 2025-07-10

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This work addresses label noise and sparsity in open-vocabulary 3D semantic segmentation of LiDAR point clouds using image-based vision-language models (VLMs) in driving scenarios. We propose LOSC, a novel framework that dynamically consolidates sparse 2D semantic labels generated by VLMs via spatiotemporal consistency modeling and multi-view image augmentation for robustness validation—yielding high-quality pseudo-labels to supervise end-to-end 3D neural network training. On nuScenes and SemanticKITTI, LOSC significantly outperforms existing zero-shot open-set semantic and panoptic segmentation methods, establishing new state-of-the-art performance in open-vocabulary 3D segmentation. Our results empirically validate the effectiveness of co-designing cross-modal semantic transfer and label optimization for robust 3D scene understanding.

Technology Category

Application Category

📝 Abstract

We study the use of image-based Vision-Language Models (VLMs) for open-vocabulary segmentation of lidar scans in driving settings. Classically, image semantics can be back-projected onto 3D point clouds. Yet, resulting point labels are noisy and sparse. We consolidate these labels to enforce both spatio-temporal consistency and robustness to image-level augmentations. We then train a 3D network based on these refined labels. This simple method, called LOSC, outperforms the SOTA of zero-shot open-vocabulary semantic and panoptic segmentation on both nuScenes and SemanticKITTI, with significant margins.

Problem

Research questions and friction points this paper is trying to address.

Enhance noisy lidar scan segmentation using VLMs

Improve spatio-temporal consistency in 3D point labels

Boost zero-shot open-vocabulary segmentation performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses image-based VLMs for lidar segmentation

Consolidates noisy labels with spatio-temporal consistency

Trains 3D network on refined labels for SOTA performance

🔎 Similar Papers

No similar papers found.