OnFly: Onboard Zero-Shot Aerial Vision-Language Navigation toward Safety and Efficiency

📅 2026-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses key challenges in zero-shot aerial vision-language navigation—namely, unstable decision-making, unreliable long-horizon progress monitoring, and the difficulty of balancing safety with efficiency—by proposing OnFly, a fully onboard real-time navigation framework. OnFly employs a shared-perception dual-agent architecture that decouples high-frequency goal generation from low-frequency progress assessment. It integrates a hybrid keyframe–recent-frame memory mechanism to stabilize KV caching and combines a semantic-geometric verifier with a receding-horizon planner to jointly optimize safety and efficiency. Experimental results demonstrate that OnFly improves task success rates in simulation from 26.4% to 67.8% and successfully completes real-world onboard flight tests, confirming its capability for real-time deployment.

Technology Category

Application Category

📝 Abstract
Aerial vision-language navigation (AVLN) enables UAVs to follow natural-language instructions in complex 3D environments. However, existing zero-shot AVLN methods often suffer from unstable single-stream Vision-Language Model decision-making, unreliable long-horizon progress monitoring, and a trade-off between safety and efficiency. We propose OnFly, a fully onboard, real-time framework for zero-shot AVLN. OnFly adopts a shared-perception dual-agent architecture that decouples high-frequency target generation from low-frequency progress monitoring, thereby stabilizing decision-making. It further employs a hybrid keyframe-recent-frame memory to preserve global trajectory context while maintaining KV-cache prefix stability, enabling reliable long-horizon monitoring with termination and recovery signals. In addition, a semantic-geometric verifier refines VLM-predicted targets for instruction consistency and geometric safety using VLM features and depth cues, while a receding-horizon planner generates optimized collision-free trajectories under geometric safety constraints, improving both safety and efficiency. In simulation, OnFly improves task success from 26.4% to 67.8%, compared with the strongest state-of-the-art baseline, while fully onboard real-world flights validate its feasibility for real-time deployment. The code will be released at https://github.com/Robotics-STAR-Lab/OnFly
Problem

Research questions and friction points this paper is trying to address.

Aerial Vision-Language Navigation
Zero-Shot Learning
Safety-Efficiency Trade-off
Long-Horizon Monitoring
Onboard UAV Navigation
Innovation

Methods, ideas, or system contributions that make the work stand out.

zero-shot aerial navigation
vision-language model
dual-agent architecture
semantic-geometric verification
onboard real-time planning
🔎 Similar Papers
No similar papers found.
G
Guiyong Zheng
School of Artificial Intelligence, Sun Yat-Sen University, Zhuhai, China; Southern University of Science and Technology, Shenzhen, China
Y
Yueting Ban
Southern University of Science and Technology, Shenzhen, China
Mingjie Zhang
Mingjie Zhang
MPhil Student, The Hong Kong University of Science and Technology (Guangzhou)
RoboticsVision-Language Navigation
J
Juepeng Zheng
School of Artificial Intelligence, Sun Yat-Sen University, Zhuhai, China
Boyu Zhou
Boyu Zhou
Assistant Professor, SUSTech
Roboticsaerial robotsactive perceptionmobile manipulation