OpenVLN: Open-world aerial Vision-Language Navigation

📅 2025-11-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address data scarcity and long-horizon trajectory planning challenges in vision-language navigation (VLN) for complex outdoor aerial environments, this paper proposes a data-efficient open-world framework. Our method leverages reconstruction-based reinforcement learning to guide fine-tuning of vision-language models (VLMs) with rule-based policies and introduces a value-function-driven dynamic long-distance trajectory synthesis mechanism, enabling end-to-end mapping from natural language instructions to robust flight policies. Evaluated on the TravelUAV benchmark under multi-reward settings, our approach achieves absolute improvements of 4.34%, 6.19%, and 4.07% in success rate, oracle success rate, and path-weighted success rate over baseline methods, respectively—demonstrating enhanced accuracy and generalization for long-range aerial navigation. The core contribution is a lightweight, rule-value-coordinated training paradigm that jointly optimizes data efficiency and long-horizon planning capability.

Technology Category

Application Category

📝 Abstract
Vision-language models (VLMs) have been widely-applied in ground-based vision-language navigation (VLN). However, the vast complexity of outdoor aerial environments compounds data acquisition challenges and imposes long-horizon trajectory planning requirements on Unmanned Aerial Vehicles (UAVs), introducing novel complexities for aerial VLN. To address these challenges, we propose a data-efficient Open-world aerial Vision-Language Navigation (i.e., OpenVLN) framework, which could execute language-guided flight with limited data constraints and enhance long-horizon trajectory planning capabilities in complex aerial environments. Specifically, we reconfigure a reinforcement learning framework to optimize the VLM for UAV navigation tasks, which can efficiently fine-tune VLM by using rule-based policies under limited training data. Concurrently, we introduce a long-horizon planner for trajectory synthesis that dynamically generates precise UAV actions via value-based rewards. To the end, we conduct sufficient navigation experiments on the TravelUAV benchmark with dataset scaling across diverse reward settings. Our method demonstrates consistent performance gains of up to 4.34% in Success Rate, 6.19% in Oracle Success Rate, and 4.07% in Success weighted by Path Length over baseline methods, validating its deployment efficacy for long-horizon UAV navigation in complex aerial environments.
Problem

Research questions and friction points this paper is trying to address.

Addressing data scarcity in aerial vision-language navigation tasks
Enhancing long-horizon trajectory planning for UAV navigation
Optimizing VLMs for complex outdoor aerial environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reconfigure reinforcement learning for VLM optimization
Fine-tune VLM using rule-based policies with limited data
Introduce long-horizon planner for dynamic trajectory synthesis
🔎 Similar Papers
No similar papers found.
P
Peican Lin
School of Automation Science and Engineering, South China University of Technology
Gan Sun
Gan Sun
Professor, South China University of Technology
Machine LearningComputer VisionArtificial IntelligenceData Mining
C
Chenxi Liu
State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences
F
Fazeng Li
School of Automation Science and Engineering, South China University of Technology
Weihong Ren
Weihong Ren
Harbin Institute of Technology, Shenzhen
image restorationmultiple object trackingaction detection
Yang Cong
Yang Cong
State Key Laboratory of Robotics, SIA, Chinese Academy of Sciences (CAS)
computer visionmachine learningmultmediarobotics