FreeFly-Thinking : Aligning Chain-of-Thought Reasoning with Continuous UAV Navigation

📅 2026-03-07

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work addresses the limited explicit reasoning capability of unmanned aerial vehicles (UAVs) in performing vision-and-language navigation (VLN) within complex outdoor environments. To this end, the authors propose an end-to-end navigation framework that integrates a chain-of-thought (CoT) reasoning mechanism. The model jointly maps first-person visual observations and natural language instructions into continuous navigation actions through a two-stage training strategy—supervised fine-tuning followed by reinforcement fine-tuning. The key contributions include the first adaptation of chain-of-thought reasoning to UAV-based VLN to enhance decision interpretability, and the creation of the first outdoor UAV-VLN dataset tailored to urban architectural settings. Experimental results demonstrate that the proposed method significantly outperforms baseline approaches in unseen test environments, thereby improving both the robustness and execution efficiency of UAV navigation in complex outdoor scenarios.

Technology Category

Application Category

📝 Abstract

Vision-Language Navigation aims to enable agents to understand natural language instructions and carry out appropriate navigation actions in real-world environments. Most work focuses on indoor settings, with little research in complex outdoor scenes. Current UAV Vision-and-Language Navigation models typically act as black boxes without explicit reasoning. We introduce FreeFly-thinking, an end-to-end VLN framework that converts the UAV agent's egocentric images and language instructions into a series of actions, inspired by environment of urban architecture proposed by OpenFly. We first construct a UAV dataset for navigation task, and then performing natural language chain of thought. We adopt a two-stage training strategy: Supervised fine-tuning and Reinforcement fine-tuning. Experiments on unseen test demonstrate a strong performance, presenting robustness and efficiency in UAV navigation issue.

Problem

Research questions and friction points this paper is trying to address.

Vision-Language Navigation

UAV Navigation

Chain-of-Thought Reasoning

Outdoor Navigation

Natural Language Instructions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-Thought Reasoning

Vision-Language Navigation

UAV Navigation