AirNav: A Large-Scale Real-World UAV Vision-and-Language Navigation Dataset with Natural and Diverse Instructions

📅 2026-01-07
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Existing research on vision-and-language navigation (VLN) for unmanned aerial vehicles (UAVs) predominantly relies on simulated environments, often suffering from unnatural instructions and limited scale. This work presents AirNav, the first large-scale, natural, and diverse VLN benchmark constructed from real-world urban aerial imagery, and introduces AirVLN-R1, a novel model that integrates supervised fine-tuning with reinforcement fine-tuning to enhance both navigation performance and generalization capability. Experimental results demonstrate the effectiveness of the proposed approach in real-world settings. The dataset, along with the code, has been publicly released, establishing a new foundation for advancing VLN research for UAVs in authentic environments.

Technology Category

Application Category

📝 Abstract
Existing Unmanned Aerial Vehicle (UAV) Vision-Language Navigation (VLN) datasets face issues such as dependence on virtual environments, lack of naturalness in instructions, and limited scale. To address these challenges, we propose AirNav, a large-scale UAV VLN benchmark constructed from real urban aerial data, rather than synthetic environments, with natural and diverse instructions. Additionally, we introduce the AirVLN-R1, which combines Supervised Fine-Tuning and Reinforcement Fine-Tuning to enhance performance and generalization. The feasibility of the model is preliminarily evaluated through real-world tests. Our dataset and code are publicly available.
Problem

Research questions and friction points this paper is trying to address.

UAV
Vision-Language Navigation
real-world dataset
natural instructions
large-scale
Innovation

Methods, ideas, or system contributions that make the work stand out.

UAV Vision-Language Navigation
real-world aerial dataset
natural language instructions
reinforcement fine-tuning
large-scale benchmark
🔎 Similar Papers
No similar papers found.
Hengxing Cai
Hengxing Cai
Sun Yat-sen University
LLMVLMVLNUAV
Y
Yijie Rao
Beihang University
L
Ligang Huang
Peking University
Z
Zanyang Zhong
School of Intelligent Systems Engineering, Sun Yat-Sen University
J
Jinhan Dong
Beijing University Of Posts and Telecommunications
J
Jingjun Tan
School of Intelligent Systems Engineering, Sun Yat-Sen University
Wenhao Lu
Wenhao Lu
Mirosoft
AIMLCVNLP
R
Renxin Zhong
School of Intelligent Systems Engineering, Sun Yat-Sen University