๐ค AI Summary
This work addresses the challenge of enabling high-speed, safe navigation for quadrotor drones in complex environments using vision-language instructions. It proposes a training-free, asynchronous multi-agent architecture that decouples high-level semantic understanding from low-level flight control: a foreground agent interprets user instructions, while a background agent performs lookahead reasoning, supported by a lightweight โimpression mapโ that retains critical scene information. This approach achieves, for the first time, integrated high-speed obstacle avoidance and task execution under visual-language guidance without any training. Experimental results demonstrate that the system outperforms existing baselines in simulation and successfully executes complex language commands at speeds up to 5 m/s while navigating safely through cluttered real-world indoor environments.
๐ Abstract
We present QuadAgent, a training-free agent system for agile quadrotor flight guided by vision-language inputs. Unlike prior end-to-end or serial agent approaches, QuadAgent decouples high-level reasoning from low-level control using an asynchronous multi-agent architecture: Foreground Workflow Agents handle active tasks and user commands, while Background Agents perform look-ahead reasoning. The system maintains scene memory via the Impression Graph, a lightweight topological map built from sparse keyframes, and ensures safe flight with a vision-based obstacle avoidance network. Simulation results show that QuadAgent outperforms baseline methods in efficiency and responsiveness. Real-world experiments demonstrate that it can interpret complex instructions, reason about its surroundings, and navigate cluttered indoor spaces at speeds up to 5 m/s.