HUGE-Bench: A Benchmark for High-Level UAV Vision-Language-Action Tasks

📅 2026-03-20

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Existing benchmarks for unmanned aerial vehicle (UAV) vision-language navigation predominantly rely on detailed step-by-step instructions, limiting their ability to evaluate agents’ capacity to safely execute complex, multi-stage tasks under concise, high-level directives. To address this gap, this work proposes HUGE-Bench, the first vision–language–action (VLA) benchmark specifically designed for high-level semantic understanding and safe execution in UAV navigation. HUGE-Bench leverages scalable digital twin environments constructed through 3D Gaussian splatting and mesh fusion, featuring eight categories of high-level tasks and 2.56 million kilometers of annotated trajectories. It further introduces novel evaluation metrics that account for procedural fidelity and collision awareness. Experimental results demonstrate that current VLA models exhibit significant deficiencies in interpreting high-level instructions and ensuring safe task execution.

Technology Category

Application Category

📝 Abstract

Existing UAV vision-language navigation (VLN) benchmarks have enabled language-guided flight, but they largely focus on long, step-wise route descriptions with goal-centric evaluation, making them less diagnostic for real operations where brief, high-level commands must be grounded into safe multi-stage behaviors. We present HUGE-Bench, a benchmark for High-Level UAV Vision-Language-Action (HL-VLA) tasks that tests whether an agent can interpret concise language and execute complex, process-oriented trajectories with safety awareness. HUGE-Bench comprises 4 real-world digital twin scenes, 8 high-level tasks, and 2.56M meters of trajectories, and is built on an aligned 3D Gaussian Splatting (3DGS)-Mesh representation that combines photorealistic rendering with collision-capable geometry for scalable generation and collision-aware evaluation. We introduce process-oriented and collision-aware metrics to assess process fidelity, terminal accuracy, and safety. Experiments on representative state-of-the-art VLA models reveal significant gaps in high-level semantic completion and safe execution, highlighting HUGE-Bench as a diagnostic testbed for high-level UAV autonomy.

Problem

Research questions and friction points this paper is trying to address.

UAV

vision-language-action

high-level command

safety-aware navigation

benchmark

Innovation

Methods, ideas, or system contributions that make the work stand out.

High-Level UAV Autonomy

Vision-Language-Action

3D Gaussian Splatting