From Dataset to Real-world: General 3D Object Detection via Generalized Cross-domain Few-shot Learning

📅 2025-03-08

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

To address the weak generalization of LiDAR-based 3D object detection under cross-domain and few-shot settings, this paper introduces the first Generalized Cross-Domain Few-Shot (GCFS) detection task. Methodologically: (1) we propose a physics-aware, 2D-guided 3D bounding box search strategy that leverages vision-language models to generate geometrically plausible candidate boxes; (2) we design a contrastively enhanced prototype learning mechanism to enable robust few-shot adaptation across domains, categories, and structural variations. Our contributions include: formally defining the GCFS paradigm, establishing three benchmark datasets, and proposing an end-to-end adaptive training framework. Experiments demonstrate that our method significantly outperforms existing approaches in detecting both novel and base classes, effectively mitigating the dual challenges of data scarcity and domain shift.

Technology Category

Application Category

📝 Abstract

LiDAR-based 3D object detection datasets have been pivotal for autonomous driving, yet they cover a limited range of objects, restricting the model's generalization across diverse deployment environments. To address this, we introduce the first generalized cross-domain few-shot (GCFS) task in 3D object detection, which focuses on adapting a source-pretrained model for high performance on both common and novel classes in a target domain with few-shot samples. Our solution integrates multi-modal fusion and contrastive-enhanced prototype learning within one framework, holistically overcoming challenges related to data scarcity and domain adaptation in the GCFS setting. The multi-modal fusion module utilizes 2D vision-language models to extract rich, open-set semantic knowledge. To address biases in point distributions across varying structural complexities, we particularly introduce a physically-aware box searching strategy that leverages laser imaging principles to generate high-quality 3D box proposals from 2D insights, enhancing object recall. To effectively capture domain-specific representations for each class from limited target data, we further propose a contrastive-enhanced prototype learning, which strengthens the model's adaptability. We evaluate our approach with three GCFS benchmark settings, and extensive experiments demonstrate the effectiveness of our solution for GCFS tasks. The code will be publicly available.

Problem

Research questions and friction points this paper is trying to address.

Limited object range in LiDAR datasets restricts generalization.

Adapting models for novel classes with few-shot samples.

Overcoming data scarcity and domain adaptation challenges.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalized cross-domain few-shot learning for 3D detection

Multi-modal fusion with 2D vision-language models

Contrastive-enhanced prototype learning for domain adaptation

🔎 Similar Papers

No similar papers found.