🤖 AI Summary
To address the insufficient robustness of multimodal 3D detection in autonomous driving under sensor missingness or unseen modality combinations, this paper proposes a parameter-efficient fine-tuning framework based on deep metric learning. Methodologically, it innovatively integrates LoRA with adapter modules to align and dynamically fuse heterogeneous modalities—including LiDAR, camera, radar, IMU, and GNSS—within a shared latent space, enabling reliable detection from arbitrary subsets of input modalities for the first time. The framework significantly enhances robustness against rapid motion, adverse weather conditions, and cross-domain distribution shifts. Evaluated on the nuScenes benchmark, it achieves state-of-the-art performance in both detection accuracy and cross-scene stability, demonstrating superior generalization capability and practical applicability.
📝 Abstract
This study introduces PEFT-DML, a parameter-efficient deep metric learning framework for robust multi-modal 3D object detection in autonomous driving. Unlike conventional models that assume fixed sensor availability, PEFT-DML maps diverse modalities (LiDAR, radar, camera, IMU, GNSS) into a shared latent space, enabling reliable detection even under sensor dropout or unseen modality class combinations. By integrating Low-Rank Adaptation (LoRA) and adapter layers, PEFT-DML achieves significant training efficiency while enhancing robustness to fast motion, weather variability, and domain shifts. Experiments on benchmarks nuScenes demonstrate superior accuracy.