IGLOSS: Image Generation for Lidar Open-vocabulary Semantic Segmentation

📅 2026-04-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of zero-shot recognition in open-vocabulary 3D semantic segmentation by proposing a novel cross-modal alignment strategy that bridges the modality gap between LiDAR point clouds and textual descriptions. The method generates class-conditional prototype images from text prompts and leverages knowledge distilled from 2D vision foundation models to construct a 3D network, aligning point cloud features with visual features extracted from the generated images to enable semantic segmentation of unseen categories. By introducing text-to-image generation into 3D open-vocabulary tasks for the first time, this framework effectively replaces conventional direct cross-modal alignment approaches. Experimental results demonstrate that the proposed method significantly outperforms existing techniques on both the nuScenes and SemanticKITTI benchmarks, achieving state-of-the-art performance.
📝 Abstract
This paper presents a new method for the zero-shot open-vocabulary semantic segmentation (OVSS) of 3D automotive lidar data. To circumvent the recognized image-text modality gap that is intrinsic to approaches based on Vision Language Models (VLMs) such as CLIP, our method relies instead on image generation from text, to create prototype images. Given a 3D network distilled from a 2D Vision Foundation Model (VFM), we then label a point cloud by matching 3D point features with 2D image features of these prototypes. Our method is state-of-the-art for OVSS on nuScenes and SemanticKITTI. Code, pre-trained models, and generated images are available at https://github.com/valeoai/IGLOSS.
Problem

Research questions and friction points this paper is trying to address.

open-vocabulary semantic segmentation
3D lidar
zero-shot learning
modality gap
autonomous driving
Innovation

Methods, ideas, or system contributions that make the work stand out.

open-vocabulary semantic segmentation
LiDAR
image generation
Vision Foundation Model
zero-shot learning
🔎 Similar Papers
No similar papers found.