🤖 AI Summary
This work addresses the limitations of existing drone simulation frameworks, which rely on manually crafted 3D environments that are difficult to scale and often lack physical plausibility and semantic consistency. To overcome these challenges, we propose the first hierarchical diffusion generative model tailored for aerial robotics tasks. By integrating hierarchy-aware tokenization with multi-branch feature extraction, our approach jointly models global scene layout and local geometric details, enabling progressive 3D scene synthesis. Notably, we introduce a hierarchical diffusion mechanism into drone-centric scene generation and couple it with a physics engine to ensure the generated environments are physically valid and directly usable for downstream tasks such as navigation and landing. Experiments on both a newly curated dataset and established benchmarks demonstrate that our method substantially outperforms existing approaches, successfully generating over 1,000 high-fidelity, physics-ready 3D scenes and significantly enhancing drone navigation performance.
📝 Abstract
Generative models have shown substantial impact across multiple domains, their potential for scene synthesis remains underexplored in robotics. This gap is more evident in drone simulators, where simulation environments still rely heavily on manual efforts, which are time-consuming to create and difficult to scale. In this work, we introduce AeroScene, a hierarchical diffusion model for progressive 3D scene synthesis. Our approach leverages hierarchy-aware tokenization and multi-branch feature extraction to reason across both global layouts and local details, ensuring physical plausibility and semantic consistency. This makes AeroScene particularly suited for generating realistic scenes for aerial robotics tasks such as navigation, landing, and perching. We demonstrate its effectiveness through extensive experiments on our newly collected dataset and a public benchmark, showing that AeroScene significantly outperforms prior methods. Furthermore, we use AeroScene to generate a large-scale dataset of over 1,000 physics-ready, high fidelity 3D scenes that can be directly integrated into NVIDIA Isaac Sim. Finally, we illustrate the utility of these generated environments on downstream drone navigation tasks. Our code and dataset are publicly available at aioz-ai.github.io/AeroScene/