CognitiveDrone: A VLA Model and Evaluation Benchmark for Real-Time Cognitive Task Solving and Reasoning in UAVs

📅 2025-03-03

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

To address the limited real-time cognitive processing and multi-step reasoning capabilities of unmanned aerial vehicles (UAVs) in complex vision-language-action (VLA) tasks, this paper proposes CognitiveDrone-R1, a UAV-specific VLA model. Methodologically, we introduce CognitiveDroneBench—the first dedicated evaluation benchmark for UAV-oriented VLA; design an end-to-end architecture integrating VLM-based reasoning preprocessing, a lightweight 4D action decoder, and a VLM-assisted instruction simplification module; and perform multimodal alignment training on over 8,000 simulated UAV trajectories. Experimental results demonstrate that CognitiveDrone-R1 achieves a 77.2% task success rate on CognitiveDroneBench—surpassing the state-of-the-art RaceVLA by 45.9 percentage points—and improves performance on core cognitive tasks by 30%, notably enhancing human identification, symbolic understanding, and multi-step reasoning under dynamic aerial conditions.

Technology Category

Application Category

📝 Abstract

This paper introduces CognitiveDrone, a novel Vision-Language-Action (VLA) model tailored for complex Unmanned Aerial Vehicles (UAVs) tasks that demand advanced cognitive abilities. Trained on a dataset comprising over 8,000 simulated flight trajectories across three key categories-Human Recognition, Symbol Understanding, and Reasoning-the model generates real-time 4D action commands based on first-person visual inputs and textual instructions. To further enhance performance in intricate scenarios, we propose CognitiveDrone-R1, which integrates an additional Vision-Language Model (VLM) reasoning module to simplify task directives prior to high-frequency control. Experimental evaluations using our open-source benchmark, CognitiveDroneBench, reveal that while a racing-oriented model (RaceVLA) achieves an overall success rate of 31.3%, the base CognitiveDrone model reaches 59.6%, and CognitiveDrone-R1 attains a success rate of 77.2%. These results demonstrate improvements of up to 30% in critical cognitive tasks, underscoring the effectiveness of incorporating advanced reasoning capabilities into UAV control systems. Our contributions include the development of a state-of-the-art VLA model for UAV control and the introduction of the first dedicated benchmark for assessing cognitive tasks in drone operations. The complete repository is available at cognitivedrone.github.io

Problem

Research questions and friction points this paper is trying to address.

Develops a Vision-Language-Action model for UAV cognitive tasks.

Introduces a benchmark for evaluating drone cognitive task performance.

Enhances UAV reasoning capabilities with a new reasoning module.

Innovation

Methods, ideas, or system contributions that make the work stand out.

VLA model for real-time UAV cognitive tasks

Integrates Vision-Language Model for enhanced reasoning

Open-source benchmark for cognitive task evaluation

🔎 Similar Papers

AeroVerse: UAV-Agent Benchmark Suite for Simulating, Pre-training, Finetuning, and Evaluating Aerospace Embodied World Models