🤖 AI Summary
This work addresses the geometric inconsistency between 4D radar and camera modalities in multi-agent collaborative perception, which arises from depth ambiguity and spatial sparsity. To this end, we propose the first unified framework specifically designed for 4D radar–camera collaborative perception. Centered on radar as the geometric anchor, our approach introduces three key innovations: Geometric Structure Rectification (GSR), Uncertainty-Aware Communication (UAC), and Consensus-Driven Assembly (CDA), which collectively enable cross-modal semantic and geometric alignment while supporting efficient, selective feature transmission. We establish the first radar–camera collaborative perception benchmark on the V2X-Radar and V2X-R datasets, demonstrating significant improvements in perception accuracy and robustness under adverse weather conditions, alongside a substantial reduction in communication overhead.
📝 Abstract
Collaborative perception (CP) enhances scene understanding through multi-agent information sharing. While LiDAR-centric systems offer precise geometry, high costs and performance degradation in adverse weather necessitate multi-modal alternatives. Despite dense visual semantics and robust spatial measurements, the synergy between cameras and 4D radar remains underexplored in collaborative settings. This work introduces RC-GeoCP, the first framework to explore the fusion of 4D radar and images in CP. To resolve misalignment caused by depth ambiguity and spatial dispersion across agents, RC-GeoCP establishes a radar-anchored geometric consensus. Specifically, Geometric Structure Rectification (GSR) aligns visual semantics with geometry derived from radar to generate spatially grounded, geometry-consistent representations. Uncertainty-Aware Communication (UAC) formulates selective transmission as a conditional entropy reduction process to prioritize informative features based on inter-agent disagreement. Finally, the Consensus-Driven Assembler (CDA) aggregates multi-agent information via shared geometric anchors to form a globally coherent representation. We establish the first unified radar-camera CP benchmark on V2X-Radar and V2X-R, demonstrating state-of-the-art performance with significantly reduced communication overhead. Code will be released soon.