๐ค AI Summary
Urban drone vision algorithms are hindered by scarcity of real-world annotated data and high labeling costs. Method: This paper introduces the first large-scale, multimodal drone dataset for urban scene understanding, integrating real and synthetic RGB images, depth maps, and semantic labelsโspanning diverse weather conditions and day/night scenarios. It proposes a novel method to generate high-fidelity monocular depth maps for real drone imagery and establishes the first cross-domain adaptation benchmark to systematically evaluate synthetic-to-real generalization. Contribution/Results: The dataset and benchmark enable rigorous evaluation of model robustness and generalization, demonstrating significant improvements in both RGB-only and multimodal semantic segmentation tasks. This work advances synthetic-data-driven drone perception research and releases the dataset and benchmark publicly.
๐ Abstract
The development of computer vision algorithms for Unmanned Aerial Vehicle (UAV) applications in urban environments heavily relies on the availability of large-scale datasets with accurate annotations. However, collecting and annotating real-world UAV data is extremely challenging and costly. To address this limitation, we present FlyAwareV2, a novel multimodal dataset encompassing both real and synthetic UAV imagery tailored for urban scene understanding tasks. Building upon the recently introduced SynDrone and FlyAware datasets, FlyAwareV2 introduces several new key contributions: 1) Multimodal data (RGB, depth, semantic labels) across diverse environmental conditions including varying weather and daytime; 2) Depth maps for real samples computed via state-of-the-art monocular depth estimation; 3) Benchmarks for RGB and multimodal semantic segmentation on standard architectures; 4) Studies on synthetic-to-real domain adaptation to assess the generalization capabilities of models trained on the synthetic data. With its rich set of annotations and environmental diversity, FlyAwareV2 provides a valuable resource for research on UAV-based 3D urban scene understanding.