AirHunt: Bridging VLM Semantics and Continuous Planning for Efficient Aerial Object Navigation

📅 2026-01-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of deploying current vision-language models (VLMs) in real-time drone navigation, which are hindered by mismatched inference frequencies, insufficient 3D scene understanding, and an imbalance between semantic guidance and motion efficiency. To overcome these limitations, we propose AirHunt, a novel system featuring an asynchronous dual-path architecture that integrates VLM-based semantic reasoning with continuous path planning. AirHunt further introduces active dual-task reasoning and a semantic-geometric consistency planning module to dynamically coordinate semantic objectives with motion efficiency while adapting to environmental changes. Experimental results demonstrate that AirHunt significantly improves success rates, reduces navigation errors, and shortens flight time across diverse outdoor open-set target navigation tasks. Real-world evaluations confirm its practicality and robustness under complex conditions.

Technology Category

Application Category

📝 Abstract
Recent advances in large Vision-Language Models (VLMs) have provided rich semantic understanding that empowers drones to search for open-set objects via natural language instructions. However, prior systems struggle to integrate VLMs into practical aerial systems due to orders-of-magnitude frequency mismatch between VLM inference and real-time planning, as well as VLMs'limited 3D scene understanding. They also lack a unified mechanism to balance semantic guidance with motion efficiency in large-scale environments. To address these challenges, we present AirHunt, an aerial object navigation system that efficiently locates open-set objects with zero-shot generalization in outdoor environments by seamlessly fusing VLM semantic reasoning with continuous path planning. AirHunt features a dual-pathway asynchronous architecture that establishes a synergistic interface between VLM reasoning and path planning, enabling continuous flight with adaptive semantic guidance that evolves through motion. Moreover, we propose an active dual-task reasoning module that exploits geometric and semantic redundancy to enable selective VLM querying, and a semantic-geometric coherent planning module that dynamically reconciles semantic priorities and motion efficiency in a unified framework, enabling seamless adaptation to environmental heterogeneity. We evaluate AirHunt across diverse object navigation tasks and environments, demonstrating a higher success rate with lower navigation error and reduced flight time compared to state-of-the-art methods. Real-world experiments further validate AirHunt's practical capability in complex and challenging environments. Code and dataset will be made publicly available before publication.
Problem

Research questions and friction points this paper is trying to address.

Vision-Language Models
aerial navigation
continuous planning
semantic-geometric integration
open-set object search
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language Models
continuous path planning
asynchronous architecture
semantic-geometric coherence
aerial object navigation
🔎 Similar Papers
No similar papers found.
X
Xuecheng Chen
Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
Z
Zongzhuo Liu
Department of Mechanical and Energy Engineering, Southern University of Science and Technology, Shenzhen, China
J
Jianfa Ma
Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, China
Bang Du
Bang Du
University of California San Diego
Tiantian Zhang
Tiantian Zhang
Tsinghua University
Reinforcement LearningClusteringData Mining
Xueqian Wang
Xueqian Wang
Tsinghua University
Information FusionTarget DetectionRadar ImagingImage Processing
Boyu Zhou
Boyu Zhou
Assistant Professor, SUSTech
Roboticsaerial robotsactive perceptionmobile manipulation