🤖 AI Summary
This work proposes an instance-aware early cooperative perception paradigm to address the high communication overhead and inefficiency of existing multimodal collaborative perception methods in autonomous driving, which typically rely on transmitting bandwidth-intensive locally fused features. The approach integrates a lightweight cooperative voxel injection into the local processing pipeline and introduces a heatmap-driven, on-demand communication mechanism that queries only the top-K most confident critical instances. By combining cross-attention-based fusion with self-attention refinement, the method effectively recovers occluded objects while drastically reducing redundant data transmission. Evaluated on the OPV2V and DAIR-V2X datasets, the proposed framework achieves an AP@0.5 of 73.01% while reducing communication bandwidth by 87.98% compared to the current state-of-the-art.
📝 Abstract
Multi-modal collaborative perception calls for great attention to enhancing the safety of autonomous driving. However, current multi-modal approaches remain a ``local fusion to communication'' sequence, which fuses multi-modal data locally and needs high bandwidth to transmit an individual's feature data before collaborative fusion. EIMC innovatively proposes an early collaborative paradigm. It injects lightweight collaborative voxels, transmitted by neighbor agents, into the ego's local modality-fusion step, yielding compact yet informative 3D collaborative priors that tighten cross-modal alignment. Next, a heatmap-driven consensus protocol identifies exactly where cooperation is needed by computing per-pixel confidence heatmaps. Only the Top-K instance vectors located in these low-confidence, high-discrepancy regions are queried from peers, then fused via cross-attention for completion. Afterwards, we apply a refinement fusion that involves collecting the top-K most confident instances from each agent and enhancing their features using self-attention. The above instance-centric messaging reduces redundancy while guaranteeing that critical occluded objects are recovered. Evaluated on OPV2V and DAIR-V2X, EIMC attains 73.01\% AP@0.5 while reducing byte bandwidth usage by 87.98\% compared with the best published multi-modal collaborative detector. Code publicly released at https://github.com/sidiangongyuan/EIMC.