Instance-level Visual Active Tracking with Occlusion-Aware Planning

📅 2026-04-23
📈 Citations: 0
Influential: 0
📄 PDF

career value

198K/year
🤖 AI Summary
This work addresses the challenges of insufficient instance discrimination and inadequate active planning under occlusion in visual active tracking, which often lead to tracking failure. To overcome these issues, the authors propose OA-VAT, a unified framework that integrates training-free offline instance prototype initialization, confidence-aware Kalman filter–based online prototype refinement, and occlusion-aware trajectory planning via a conditional diffusion model. The method leverages DINOv3 for multi-view feature aggregation and continuously updates prototypes online. The planning module is trained on a newly introduced Planning-20k dataset. Experimental results demonstrate that OA-VAT achieves a success rate (SR) of 93% in the UnrealCV simulation environment, a CAR of 90.8% in real-world scenarios, and a TSR of 81.6% on the DJI Tello drone, while running in real time at 35 FPS on an RTX 3090 GPU.

Technology Category

Application Category

📝 Abstract
Visual Active Tracking (VAT) aims to control cameras to follow a target in 3D space, which is critical for applications like drone navigation and security surveillance. However, it faces two key bottlenecks in real-world deployment: confusion from visually similar distractors caused by insufficient instance-level discrimination and severe failure under occlusions due to the absence of active planning. To address these, we propose OA-VAT, a unified pipeline with three complementary modules. First, a training-free Instance-Aware Offline Prototype Initialization aggregates multi-view augmented features via DINOv3 to construct discriminative instance prototypes, mitigating distractor confusion. Second, an Online Prototype Enhancement Tracker enhances prototypes online and integrates a confidence-aware Kalman filter for stable tracking under appearance and motion changes. Third, an Occlusion-Aware Trajectory Planner, trained on our new Planning-20k dataset, uses conditional diffusion to generate obstacle-avoiding paths for occlusion recovery. Experiments demonstrate OA-VAT achieves 0.93 average SR on UnrealCV (+2.2% vs. SOTA TrackVLA), 90.8% average CAR on real-world datasets (+12.1% vs. SOTA GC-VAT), and 81.6% TSR on a DJI Tello drone. Running at 35 FPS on an RTX 3090, it delivers robust, real-time performance for practical deployment.
Problem

Research questions and friction points this paper is trying to address.

Visual Active Tracking
Instance-level Discrimination
Occlusion Handling
Distractor Confusion
Active Planning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Instance-level Discrimination
Occlusion-Aware Planning
Prototype-based Tracking
Conditional Diffusion
Visual Active Tracking
🔎 Similar Papers
No similar papers found.