DeTrack: A Benchmark and Altitude-Aware Dual World Model for Drone-embodied Tracking

📅 2026-05-17
📈 Citations: 0
Influential: 0
📄 PDF

career value

200K/year
🤖 AI Summary
Existing aerial object tracking benchmarks treat drones as passive cameras, overlooking their capabilities as embodied agents capable of active perception and control in dynamic 3D environments. This work introduces DeTrack, the first drone-embodied tracking task, along with a large-scale benchmark comprising 11,368 trajectories that supports active flight control and interaction within simulated 3D environments, complemented by a closed-loop evaluation protocol. To address this setting, we propose the AaDWorlds framework, which integrates dual high- and low-altitude world models and a height-aware module to fuse pseudo-height-aware observations and enable closed-loop flight policies that jointly optimize field-of-view coverage and flight safety. Experiments demonstrate that AaDWorlds significantly outperforms existing methods in terms of target visibility, tracking accuracy, and trajectory success rate.
📝 Abstract
Aerial object tracking has broad applications in public safety, emergency rescue, wildlife monitoring, and related fields. However, existing aerial tracking benchmarks are mainly based on passive 2D video sequences captured from fixed camera locations or predefined flight paths, where drones are treated as passive cameras rather than embodied agents that actively perceive, interact, and control their motion in dynamic 3D scenes. In this paper, we define a new drone-embodied tracking task, termed DeTrack, which requires a drone to track a target in interactive 3D environments using online egocentric observations and active flight control in a closed loop. We build a large-scale benchmark containing 11,368 target trajectories across diverse scenes, rendering conditions, semantic regions, and moving distractors, together with evaluation metrics for target visibility, tracking accuracy, and trajectory success. We further propose AaDWorlds, an altitude-aware dual world model framework for drone-embodied tracking. AaDWorlds consists of an altitude-aware perception module and dual world models that imagine future states under both high- and low-altitude regimes. By combining pseudo altitude-aware observations and imagined future states, AaDWorlds alleviates the intrinsic altitude-mediated contradiction between target visibility and flight safety. Experiments on the DeTrack benchmark demonstrate that AaDWorlds improves closed-loop tracking performance across all evaluation metrics.
Problem

Research questions and friction points this paper is trying to address.

drone-embodied tracking
aerial object tracking
active perception
3D interactive environments
altitude-aware tracking
Innovation

Methods, ideas, or system contributions that make the work stand out.

drone-embodied tracking
altitude-aware perception
dual world model
closed-loop control
aerial object tracking