OpenCOOD-Air: Prompting Heterogeneous Ground-Air Collaborative Perception with Spatial Conversion and Offset Prediction

📅 2026-03-14

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

Ground vehicle cooperative perception is significantly hindered by occlusions on the ground plane and limited sensor viewpoints, resulting in substantial blind spots. To address this, this work proposes a heterogeneous ground-air collaborative perception framework that integrates unmanned aerial vehicles (UAVs) with ground agents. The framework employs a Cross-Domain Spatial Converter (CDSC) and a Spatial Offset Prediction Transformer (SOPT) to align and fuse features across aerial and ground domains, further enhanced by explicit height supervision to mitigate inter-domain discrepancies. The study introduces OPV2V-Air, the first benchmark for ground-air collaborative perception, extending the conventional vehicle-to-vehicle (V2V) paradigm to a vehicle-to-vehicle-to-UAV (V2V2UAV) setting. Experimental results on OPV2V-Air demonstrate that the proposed method improves 2D and 3D AP@0.7 by 4% and 7%, respectively, substantially outperforming existing approaches.

Technology Category

Application Category

📝 Abstract

While Vehicle-to-Vehicle (V2V) collaboration extends sensing ranges through multi-agent data sharing, its reliability remains severely constrained by ground-level occlusions and the limited perspective of chassis-mounted sensors, which often result in critical perception blind spots. We propose OpenCOOD-Air, a novel framework that integrates UAVs as extensible platforms into V2V collaborative perception to overcome these constraints. To mitigate gradient interference from ground-air domain gaps and data sparsity, we adopt a transfer learning strategy to fine-tune UAV weights from pre-trained V2V models. To prevent the spatial information loss inherent in this transition, we formulate ground-air collaborative perception as a heterogeneous integration task with explicit altitude supervision and introduce a Cross-Domain Spatial Converter (CDSC) and a Spatial Offset Prediction Transformer (SOPT). Furthermore, we present the OPV2V-Air benchmark to validate the transition from V2V to Vehicle-to-Vehicle-to-UAV. Compared to state-of-the-art methods, our approach improves 2D and 3D AP@0.7 by 4% and 7%, respectively.

Problem

Research questions and friction points this paper is trying to address.

collaborative perception

occlusion

limited perspective

perception blind spots

heterogeneous sensing

Innovation

Methods, ideas, or system contributions that make the work stand out.

heterogeneous collaborative perception

spatial conversion

offset prediction