CCF: Complementary Collaborative Fusion for Domain Generalized Multi-Modal 3D Object Detection

📅 2026-03-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the significant performance degradation of multimodal 3D object detection in cross-domain scenarios—such as rainy or nighttime conditions—and the overreliance on LiDAR with insufficient exploitation of visual cues in existing methods. Focusing on a two-branch proposal-level detector, the authors propose query decoupling supervision, a LiDAR-guided instance-aware depth prior, and a complementary cross-modal masking mechanism to enhance modality collaboration and adaptive fusion. By integrating probabilistic depth estimation, spatial masking strategies, and multi-task query supervision, the approach optimizes multimodal feature interaction and gradient balance. The method maintains strong performance on the source domain while substantially outperforming current cross-domain 3D detection approaches, demonstrating superior robustness under adverse environmental conditions.

Technology Category

Application Category

📝 Abstract
Multi-modal fusion has emerged as a promising paradigm for accurate 3D object detection. However, performance degrades substantially when deployed in target domains different from training. In this work, focusing on dual-branch proposal-level detectors, we identify two factors that limit robust cross-domain generalization: 1) in challenging domains such as rain or nighttime, one modality may undergo severe degradation; 2) the LiDAR branch often dominates the detection process, leading to systematic underutilization of visual cues and vulnerability when point clouds are compromised. To address these challenges, we propose three components. First, Query-Decoupled Loss provides independent supervision for 2D-only, 3D-only, and fused queries, rebalancing gradient flow across modalities. Second, LiDAR-Guided Depth Prior augments 2D queries with instance-aware geometric priors through probabilistic fusion of image-predicted and LiDAR-derived depth distributions, improving their spatial initialization. Third, Complementary Cross-Modal Masking applies complementary spatial masks to the image and point cloud, encouraging queries from both modalities to compete within the fused decoder and thereby promoting adaptive fusion. Extensive experiments demonstrate substantial gains over state-of-the-art baselines while preserving source-domain performance. Code and models are publicly available at https://github.com/IMPL-Lab/CCF.
Problem

Research questions and friction points this paper is trying to address.

Domain Generalization
Multi-Modal Fusion
3D Object Detection
Cross-Domain Robustness
Modality Degradation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Domain Generalization
Multi-Modal Fusion
3D Object Detection
Cross-Modal Masking
Depth Prior
🔎 Similar Papers
No similar papers found.