Towards Learning to Complete Anything in Lidar

📅 2025-04-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LiDAR-based shape completion methods rely on closed-set semantic assumptions, limiting them to known categories seen during training and preventing zero-shot generalization to unseen object classes. Method: We propose the first zero-shot, open-vocabulary LiDAR instance-level completion framework. Leveraging multimodal temporal sensor sequences (e.g., camera + LiDAR), it extracts spatiotemporal shape and semantic priors, then employs knowledge distillation to train a generalizable single-frame LiDAR model. The framework integrates LiDAR-only instance segmentation, amodal shape generation, and open-vocabulary semantic alignment. Contribution/Results: It enables 3D shape completion, occlusion-free 3D bounding box localization, and cross-category recognition for arbitrary unseen classes. Evaluated on semantic and panoptic scene completion benchmarks, it achieves state-of-the-art performance—marking the first method to break the closed-set constraint and support end-to-end zero-shot 3D object completion and understanding.

Technology Category

Application Category

📝 Abstract
We propose CAL (Complete Anything in Lidar) for Lidar-based shape-completion in-the-wild. This is closely related to Lidar-based semantic/panoptic scene completion. However, contemporary methods can only complete and recognize objects from a closed vocabulary labeled in existing Lidar datasets. Different to that, our zero-shot approach leverages the temporal context from multi-modal sensor sequences to mine object shapes and semantic features of observed objects. These are then distilled into a Lidar-only instance-level completion and recognition model. Although we only mine partial shape completions, we find that our distilled model learns to infer full object shapes from multiple such partial observations across the dataset. We show that our model can be prompted on standard benchmarks for Semantic and Panoptic Scene Completion, localize objects as (amodal) 3D bounding boxes, and recognize objects beyond fixed class vocabularies. Our project page is https://research.nvidia.com/labs/dvl/projects/complete-anything-lidar
Problem

Research questions and friction points this paper is trying to address.

Zero-shot Lidar shape completion for unseen objects
Leveraging multi-modal data for instance-level recognition
Generalizing beyond fixed class vocabularies in 3D detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-shot approach using multi-modal temporal context
Distills object shapes into Lidar-only model
Recognizes objects beyond fixed class vocabularies
🔎 Similar Papers
No similar papers found.