FlyMirage: A Fully Automated Generation Pipeline for Diverse and Scalable UAV Flight Data via Generative World Model

📅 2026-05-19
📈 Citations: 0
Influential: 0
📄 PDF

career value

200K/year
🤖 AI Summary
Existing vision-and-language navigation (VLN) datasets for aerial agents are limited by insufficient scale, diversity, and realism, often relying on costly real-world data collection or low-fidelity simulation. This work proposes the first fully automated data generation framework that integrates large language models (LLMs) with 3D Gaussian Splatting to address these limitations. The approach leverages LLMs to design diverse, semantically rich environments, which are then instantiated into high-fidelity 3D scenes using a generative world model. It further automates semantic annotation and generates dynamically feasible drone trajectories through these scenes. By minimizing human intervention, the method efficiently produces large-scale, photorealistic, and physically plausible aerial VLN datasets, offering a crucial foundation for training next-generation embodied navigation models.
📝 Abstract
In the field of Vision-Language Navigation (VLN), aerial datasets remain limited in their ability to combine scale, diversity, and realism, often relying on either costly real-world scenes or visually limited simulations. To address these challenges, we introduce FlyMirage, a highly scalable and fully automated data generation pipeline for aerial VLN. Our approach leverages large language models (LLM) as an environment designer to promote scene diversity, paired with a generative world model that instantiates these designs into high-fidelity 3D Gaussian Splatting (3DGS) scenes. To substantially reduce human labor and ensure the feasibility of flight data, FlyMirage automates scene exploration and semantic information acquisition, and further integrates a dynamically feasible planner for uncrewed aerial vehicle (UAV) trajectory generation. Utilizing this toolchain, we generate a large-scale, diverse, and photorealistic aerial VLN dataset, with dynamically feasible flying trajectories, designed to support the development of next-generation embodied navigation models.
Problem

Research questions and friction points this paper is trying to address.

Vision-Language Navigation
UAV flight data
dataset scalability
scene diversity
photorealism
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative World Model
3D Gaussian Splatting
Vision-Language Navigation
Autonomous UAV Trajectory Planning
Large Language Models
Jinhan Li
Jinhan Li
Undergraduate Student, New York University
Xijie Huang
Xijie Huang
Hong Kong University of Science and Technology
Efficient Deep LearningModel Compression
Z
Zhaoqi Wang
Differential Robotics, Hangzhou 311121, China
Yijin Wang
Yijin Wang
undergraduate,Xidian University
machine learning
W
Weiqi Ge
Differential Robotics, Hangzhou 311121, China
Q
Qiyi He
Differential Robotics, Hangzhou 311121, China
M
Mo Zhu
State Key Laboratory of Industrial Control Technology, Zhejiang University, Hangzhou 310027, China; Differential Robotics, Hangzhou 311121, China
Fei Gao
Fei Gao
Associate Professor, Zhejiang University
Aerial RoboticsMotion PlanningAutonomous Navigation
Yuze Wu
Yuze Wu
Zhejiang University
Control & PlanningRobot LearningEmbodied Intelligence
X
Xin Zhou
Differential Robotics, Hangzhou 311121, China