🤖 AI Summary
To address the limited real-time cognitive processing and multi-step reasoning capabilities of unmanned aerial vehicles (UAVs) in complex vision-language-action (VLA) tasks, this paper proposes CognitiveDrone-R1, a UAV-specific VLA model. Methodologically, we introduce CognitiveDroneBench—the first dedicated evaluation benchmark for UAV-oriented VLA; design an end-to-end architecture integrating VLM-based reasoning preprocessing, a lightweight 4D action decoder, and a VLM-assisted instruction simplification module; and perform multimodal alignment training on over 8,000 simulated UAV trajectories. Experimental results demonstrate that CognitiveDrone-R1 achieves a 77.2% task success rate on CognitiveDroneBench—surpassing the state-of-the-art RaceVLA by 45.9 percentage points—and improves performance on core cognitive tasks by 30%, notably enhancing human identification, symbolic understanding, and multi-step reasoning under dynamic aerial conditions.
📝 Abstract
This paper introduces CognitiveDrone, a novel Vision-Language-Action (VLA) model tailored for complex Unmanned Aerial Vehicles (UAVs) tasks that demand advanced cognitive abilities. Trained on a dataset comprising over 8,000 simulated flight trajectories across three key categories-Human Recognition, Symbol Understanding, and Reasoning-the model generates real-time 4D action commands based on first-person visual inputs and textual instructions. To further enhance performance in intricate scenarios, we propose CognitiveDrone-R1, which integrates an additional Vision-Language Model (VLM) reasoning module to simplify task directives prior to high-frequency control. Experimental evaluations using our open-source benchmark, CognitiveDroneBench, reveal that while a racing-oriented model (RaceVLA) achieves an overall success rate of 31.3%, the base CognitiveDrone model reaches 59.6%, and CognitiveDrone-R1 attains a success rate of 77.2%. These results demonstrate improvements of up to 30% in critical cognitive tasks, underscoring the effectiveness of incorporating advanced reasoning capabilities into UAV control systems. Our contributions include the development of a state-of-the-art VLA model for UAV control and the introduction of the first dedicated benchmark for assessing cognitive tasks in drone operations. The complete repository is available at cognitivedrone.github.io