MUSDA: Multi-source Multi-modality Unsupervised Domain Adaptive 3D Object Detection for Autonomous Driving

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

This work addresses the challenge of unsupervised domain adaptation for 3D object detection in autonomous driving with multi-source, multi-modal data. The proposed framework effectively transfers knowledge across domains without requiring annotations in the target domain by jointly aligning features from camera and LiDAR modalities. It introduces a hierarchical spatially conditioned (HSC) domain classifier to enable fine-grained cross-domain alignment and constructs a prototype graph to integrate detections from multiple sources via a prototype graph weighting (PGW) strategy. Extensive experiments on the Waymo, nuScenes, and Lyft benchmarks demonstrate that the method significantly outperforms existing approaches, confirming its effectiveness and robustness in multi-source, multi-modal settings.

📝 Abstract

With the advancement of autonomous driving, numerous annotated multi-modality datasets have become available. This presents an opportunity to develop domain-adaptive 3D object detectors for new environments without relying on labor-intensive manual annotations. However, traditional domain adaptation methods typically focus on a single source domain or a single modality, limiting their effectiveness in multi-source, multi-modality scenarios. In this paper, we propose a novel framework for multi-source, multi-modality unsupervised domain adaptation in 3D object detection for autonomous driving. Given multiple labeled source domains and one unlabeled target domain, our framework first introduces hierarchical spatially-conditioned (HSC) domain classifiers, which jointly align features from both camera and LiDAR modalities at two distinct levels for each source-target domain pair. To effectively leverage information from multiple source domains, we construct a prototype graph between each pair of domains. Based on this, we develop a prototype graph weighted (PGW) multi-source fusion strategy to aggregate predictions from multiple source detection heads. Experimental results on three widely used 3D object detection datasets - Waymo, nuScenes, and Lyft - demonstrate that our proposed framework effectively integrates information across both modalities and source domains, consistently outperforming state-of-the-art methods.

Problem

Research questions and friction points this paper is trying to address.

unsupervised domain adaptation

3D object detection

multi-source

multi-modality

autonomous driving

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-source domain adaptation

multi-modality fusion

unsupervised 3D object detection