IGLOSS: Image Generation for Lidar Open-vocabulary Semantic Segmentation

📅 2026-04-01

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses the challenge of zero-shot recognition in open-vocabulary 3D semantic segmentation by proposing a novel cross-modal alignment strategy that bridges the modality gap between LiDAR point clouds and textual descriptions. The method generates class-conditional prototype images from text prompts and leverages knowledge distilled from 2D vision foundation models to construct a 3D network, aligning point cloud features with visual features extracted from the generated images to enable semantic segmentation of unseen categories. By introducing text-to-image generation into 3D open-vocabulary tasks for the first time, this framework effectively replaces conventional direct cross-modal alignment approaches. Experimental results demonstrate that the proposed method significantly outperforms existing techniques on both the nuScenes and SemanticKITTI benchmarks, achieving state-of-the-art performance.

Technology Category

Application Category

📝 Abstract

This paper presents a new method for the zero-shot open-vocabulary semantic segmentation (OVSS) of 3D automotive lidar data. To circumvent the recognized image-text modality gap that is intrinsic to approaches based on Vision Language Models (VLMs) such as CLIP, our method relies instead on image generation from text, to create prototype images. Given a 3D network distilled from a 2D Vision Foundation Model (VFM), we then label a point cloud by matching 3D point features with 2D image features of these prototypes. Our method is state-of-the-art for OVSS on nuScenes and SemanticKITTI. Code, pre-trained models, and generated images are available at https://github.com/valeoai/IGLOSS.

Problem

Research questions and friction points this paper is trying to address.

open-vocabulary semantic segmentation

3D lidar

zero-shot learning

modality gap

autonomous driving

Innovation

Methods, ideas, or system contributions that make the work stand out.

open-vocabulary semantic segmentation

LiDAR

image generation