🤖 AI Summary
This work addresses the challenge in cooperative perception where existing methods fail due to unknown or dynamically changing configurations of auxiliary agents—such as variations in LiDAR beam counts or network architectures. To overcome this, the authors propose the ALF framework, which maps lightweight bounding-box-level messages into ego-vehicle-compatible pseudo-BEV features by integrating scene context and object-centric cues, thereby synthesizing latent representations adaptable to arbitrary heterogeneous agents. ALF achieves, for the first time, zero-shot heterogeneous cooperative perception without requiring fine-tuning or adaptation, enabling plug-and-play integration of new agents in open-world settings. Evaluated on the V2X-Real dataset across 64 zero-shot scenarios, ALF improves mAP@0.7 by 35.91% over the strongest baseline while maintaining an ultra-low communication overhead of only 120 bytes per frame per agent (approximately 9.6 Kbps at 10 Hz).
📝 Abstract
Collaborative perception improves 3D object detection by enabling agents to share complementary observations, but most existing methods assume fixed or known collaborator encoder configurations, limiting deployment in practice. In this work, we consider an open-world setting in which auxiliary agents with unseen configurations may appear after deployment, such as different LiDAR beam counts or encoder architectures. To address this challenge, we propose ALF, a collaborative perception framework that enables zero-adaptation collaboration with unseen agent configurations by lifting lightweight box-level messages into ego-compatible auxiliary features. ALF converts auxiliary box-level messages into pseudo-BEV maps and synthesizes ego-compatible latent features by combining object-centric cues with scene context from the ego feature. On V2X-Real, under a zero-shot evaluation across 64 case studies, ALF outperforms the strongest prior baseline by 35.91% in relative mAP@0.7 while requiring only 120 bytes per agent per frame (approximately 9.6 Kbps bandwidth at 10 Hz).