🤖 AI Summary
Ground vehicle cooperative perception is significantly hindered by occlusions on the ground plane and limited sensor viewpoints, resulting in substantial blind spots. To address this, this work proposes a heterogeneous ground-air collaborative perception framework that integrates unmanned aerial vehicles (UAVs) with ground agents. The framework employs a Cross-Domain Spatial Converter (CDSC) and a Spatial Offset Prediction Transformer (SOPT) to align and fuse features across aerial and ground domains, further enhanced by explicit height supervision to mitigate inter-domain discrepancies. The study introduces OPV2V-Air, the first benchmark for ground-air collaborative perception, extending the conventional vehicle-to-vehicle (V2V) paradigm to a vehicle-to-vehicle-to-UAV (V2V2UAV) setting. Experimental results on OPV2V-Air demonstrate that the proposed method improves 2D and 3D AP@0.7 by 4% and 7%, respectively, substantially outperforming existing approaches.
📝 Abstract
While Vehicle-to-Vehicle (V2V) collaboration extends sensing ranges through multi-agent data sharing, its reliability remains severely constrained by ground-level occlusions and the limited perspective of chassis-mounted sensors, which often result in critical perception blind spots. We propose OpenCOOD-Air, a novel framework that integrates UAVs as extensible platforms into V2V collaborative perception to overcome these constraints. To mitigate gradient interference from ground-air domain gaps and data sparsity, we adopt a transfer learning strategy to fine-tune UAV weights from pre-trained V2V models. To prevent the spatial information loss inherent in this transition, we formulate ground-air collaborative perception as a heterogeneous integration task with explicit altitude supervision and introduce a Cross-Domain Spatial Converter (CDSC) and a Spatial Offset Prediction Transformer (SOPT). Furthermore, we present the OPV2V-Air benchmark to validate the transition from V2V to Vehicle-to-Vehicle-to-UAV. Compared to state-of-the-art methods, our approach improves 2D and 3D AP@0.7 by 4% and 7%, respectively.