OpenCOOD-Air: Prompting Heterogeneous Ground-Air Collaborative Perception with Spatial Conversion and Offset Prediction

📅 2026-03-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Ground vehicle cooperative perception is significantly hindered by occlusions on the ground plane and limited sensor viewpoints, resulting in substantial blind spots. To address this, this work proposes a heterogeneous ground-air collaborative perception framework that integrates unmanned aerial vehicles (UAVs) with ground agents. The framework employs a Cross-Domain Spatial Converter (CDSC) and a Spatial Offset Prediction Transformer (SOPT) to align and fuse features across aerial and ground domains, further enhanced by explicit height supervision to mitigate inter-domain discrepancies. The study introduces OPV2V-Air, the first benchmark for ground-air collaborative perception, extending the conventional vehicle-to-vehicle (V2V) paradigm to a vehicle-to-vehicle-to-UAV (V2V2UAV) setting. Experimental results on OPV2V-Air demonstrate that the proposed method improves 2D and 3D AP@0.7 by 4% and 7%, respectively, substantially outperforming existing approaches.

Technology Category

Application Category

📝 Abstract
While Vehicle-to-Vehicle (V2V) collaboration extends sensing ranges through multi-agent data sharing, its reliability remains severely constrained by ground-level occlusions and the limited perspective of chassis-mounted sensors, which often result in critical perception blind spots. We propose OpenCOOD-Air, a novel framework that integrates UAVs as extensible platforms into V2V collaborative perception to overcome these constraints. To mitigate gradient interference from ground-air domain gaps and data sparsity, we adopt a transfer learning strategy to fine-tune UAV weights from pre-trained V2V models. To prevent the spatial information loss inherent in this transition, we formulate ground-air collaborative perception as a heterogeneous integration task with explicit altitude supervision and introduce a Cross-Domain Spatial Converter (CDSC) and a Spatial Offset Prediction Transformer (SOPT). Furthermore, we present the OPV2V-Air benchmark to validate the transition from V2V to Vehicle-to-Vehicle-to-UAV. Compared to state-of-the-art methods, our approach improves 2D and 3D AP@0.7 by 4% and 7%, respectively.
Problem

Research questions and friction points this paper is trying to address.

collaborative perception
occlusion
limited perspective
perception blind spots
heterogeneous sensing
Innovation

Methods, ideas, or system contributions that make the work stand out.

heterogeneous collaborative perception
spatial conversion
offset prediction
UAV-assisted V2V
cross-domain fusion
🔎 Similar Papers
No similar papers found.
X
Xianke Wu
School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China
S
Songlin Bai
Meituan Group, Beijing 100102, China
C
Chengxiang Li
School of Information and Intelligent Engineering, University of Sanya, Sanya 572000, Hainan Province, China
Z
Zhiyao Luo
Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
Y
Yulin Tian
College of Mechanical and Electrical Engineering, Zhoukou Normal University, Zhoukou 466001, Henan Province, China
F
Fenghua Zhu
Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
Yisheng Lv
Yisheng Lv
The University of Chinese Academy of Sciences, and Chinese Academy of Sciences
Parallel IntelligenceAI for TransportationAutonomous VehiclesParallel Transportation Systems
Yonglin Tian
Yonglin Tian
Institute of Automation, Chinese Academy of Sciences
Parallel intelligenceParallel umanned systemsIntelligent vehiclesAutonomous driving