QuadAgent: A Responsive Agent System for Vision-Language Guided Quadrotor Agile Flight

📅 2026-04-03

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This work addresses the challenge of enabling high-speed, safe navigation for quadrotor drones in complex environments using vision-language instructions. It proposes a training-free, asynchronous multi-agent architecture that decouples high-level semantic understanding from low-level flight control: a foreground agent interprets user instructions, while a background agent performs lookahead reasoning, supported by a lightweight “impression map” that retains critical scene information. This approach achieves, for the first time, integrated high-speed obstacle avoidance and task execution under visual-language guidance without any training. Experimental results demonstrate that the system outperforms existing baselines in simulation and successfully executes complex language commands at speeds up to 5 m/s while navigating safely through cluttered real-world indoor environments.

Technology Category

Application Category

📝 Abstract

We present QuadAgent, a training-free agent system for agile quadrotor flight guided by vision-language inputs. Unlike prior end-to-end or serial agent approaches, QuadAgent decouples high-level reasoning from low-level control using an asynchronous multi-agent architecture: Foreground Workflow Agents handle active tasks and user commands, while Background Agents perform look-ahead reasoning. The system maintains scene memory via the Impression Graph, a lightweight topological map built from sparse keyframes, and ensures safe flight with a vision-based obstacle avoidance network. Simulation results show that QuadAgent outperforms baseline methods in efficiency and responsiveness. Real-world experiments demonstrate that it can interpret complex instructions, reason about its surroundings, and navigate cluttered indoor spaces at speeds up to 5 m/s.

Problem

Research questions and friction points this paper is trying to address.

quadrotor

vision-language guidance

agile flight

autonomous navigation

obstacle avoidance

Innovation

Methods, ideas, or system contributions that make the work stand out.

asynchronous multi-agent architecture

vision-language navigation

impression graph