EIMC: Efficient Instance-aware Multi-modal Collaborative Perception

📅 2026-03-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes an instance-aware early cooperative perception paradigm to address the high communication overhead and inefficiency of existing multimodal collaborative perception methods in autonomous driving, which typically rely on transmitting bandwidth-intensive locally fused features. The approach integrates a lightweight cooperative voxel injection into the local processing pipeline and introduces a heatmap-driven, on-demand communication mechanism that queries only the top-K most confident critical instances. By combining cross-attention-based fusion with self-attention refinement, the method effectively recovers occluded objects while drastically reducing redundant data transmission. Evaluated on the OPV2V and DAIR-V2X datasets, the proposed framework achieves an AP@0.5 of 73.01% while reducing communication bandwidth by 87.98% compared to the current state-of-the-art.

Technology Category

Application Category

📝 Abstract
Multi-modal collaborative perception calls for great attention to enhancing the safety of autonomous driving. However, current multi-modal approaches remain a ``local fusion to communication'' sequence, which fuses multi-modal data locally and needs high bandwidth to transmit an individual's feature data before collaborative fusion. EIMC innovatively proposes an early collaborative paradigm. It injects lightweight collaborative voxels, transmitted by neighbor agents, into the ego's local modality-fusion step, yielding compact yet informative 3D collaborative priors that tighten cross-modal alignment. Next, a heatmap-driven consensus protocol identifies exactly where cooperation is needed by computing per-pixel confidence heatmaps. Only the Top-K instance vectors located in these low-confidence, high-discrepancy regions are queried from peers, then fused via cross-attention for completion. Afterwards, we apply a refinement fusion that involves collecting the top-K most confident instances from each agent and enhancing their features using self-attention. The above instance-centric messaging reduces redundancy while guaranteeing that critical occluded objects are recovered. Evaluated on OPV2V and DAIR-V2X, EIMC attains 73.01\% AP@0.5 while reducing byte bandwidth usage by 87.98\% compared with the best published multi-modal collaborative detector. Code publicly released at https://github.com/sidiangongyuan/EIMC.
Problem

Research questions and friction points this paper is trying to address.

multi-modal collaborative perception
bandwidth efficiency
instance-aware communication
autonomous driving
collaborative fusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

early collaborative perception
instance-aware communication
heatmap-driven consensus
cross-attention fusion
bandwidth-efficient V2X
🔎 Similar Papers
K
Kang Yang
School of Information, Renmin University of China, Beijing, China, 100872
Peng Wang
Peng Wang
Renmin University of China
3D Perception
L
Lantao Li
Sony Research and Development Center China, Beijing, China
T
Tianci Bu
National University of Defense Technology, Hunan, China, 410073
Chen Sun
Chen Sun
Sony
knowledge distillationfederated learningwireless for AIdynamic spectrumV2X
D
Deying Li
School of Information, Renmin University of China, Beijing, China, 100872
Y
Yongcai Wang
School of Information, Renmin University of China, Beijing, China, 100872