VLA-AN: An Efficient and Onboard Vision-Language-Action Framework for Aerial Navigation in Complex Environments

📅 2025-12-17

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

To address four key challenges in autonomous UAV navigation within complex environments—domain shift, weak temporal reasoning, low safety of generated actions, and difficulty in onboard deployment—this paper proposes the first edge-deployable vision-language-action (VLA) closed-loop navigation framework. Methodologically, it introduces a synthetic dataset built upon 3D Gaussian splatting and a progressive three-stage supervised training paradigm; further, it designs a lightweight real-time action decoder coupled with a geometrically constrained safety correction mechanism to ensure physical feasibility of generated policies. Evaluated on resource-constrained onboard platforms, the framework achieves an 8.3× improvement in end-to-end inference throughput and attains up to 98.1% single-task success rate. It significantly enhances spatial referring comprehension, multi-step scene reasoning, and long-horizon navigation robustness.

Technology Category

Application Category

📝 Abstract

This paper proposes VLA-AN, an efficient and onboard Vision-Language-Action (VLA) framework dedicated to autonomous drone navigation in complex environments. VLA-AN addresses four major limitations of existing large aerial navigation models: the data domain gap, insufficient temporal navigation with reasoning, safety issues with generative action policies, and onboard deployment constraints. First, we construct a high-fidelity dataset utilizing 3D Gaussian Splatting (3D-GS) to effectively bridge the domain gap. Second, we introduce a progressive three-stage training framework that sequentially reinforces scene comprehension, core flight skills, and complex navigation capabilities. Third, we design a lightweight, real-time action module coupled with geometric safety correction. This module ensures fast, collision-free, and stable command generation, mitigating the safety risks inherent in stochastic generative policies. Finally, through deep optimization of the onboard deployment pipeline, VLA-AN achieves a robust real-time 8.3x improvement in inference throughput on resource-constrained UAVs. Extensive experiments demonstrate that VLA-AN significantly improves spatial grounding, scene reasoning, and long-horizon navigation, achieving a maximum single-task success rate of 98.1%, and providing an efficient, practical solution for realizing full-chain closed-loop autonomy in lightweight aerial robots.

Problem

Research questions and friction points this paper is trying to address.

Bridging domain gaps between simulation and real-world drone navigation data

Ensuring safe and stable autonomous flight with real-time collision avoidance

Enabling efficient onboard deployment for resource-constrained aerial robots

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses 3D Gaussian Splatting to bridge domain gap

Implements progressive three-stage training for navigation

Deploys lightweight action module with safety correction

🔎 Similar Papers

An Open-Source Soft Robotic Platform for Autonomous Aerial Manipulation in the Wild