Interactive Tracking: A Human-in-the-Loop Paradigm with Memory-Augmented Adaptation

📅 2026-04-02

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This work proposes a novel paradigm for interactive visual tracking that enables users to dynamically guide the tracking process through natural language instructions, addressing the limitations of existing non-interactive trackers in scenarios requiring real-time human intervention. To facilitate research in this direction, the authors introduce InteractTrack, the first large-scale benchmark for interactive tracking, comprising 150 videos densely annotated with temporally aligned language instructions. They further present IMAT, a baseline model incorporating a dynamic memory mechanism to effectively fuse linguistic guidance with visual cues and support online adaptive updates. Experimental results demonstrate that state-of-the-art trackers substantially degrade under interactive settings, whereas IMAT exhibits robust human-in-the-loop tracking performance on the new benchmark, thereby advancing the field of interactive perception.

Technology Category

Application Category

📝 Abstract

Existing visual trackers mainly operate in a non-interactive, fire-and-forget manner, making them impractical for real-world scenarios that require human-in-the-loop adaptation. To overcome this limitation, we introduce Interactive Tracking, a new paradigm that allows users to guide the tracker at any time using natural language commands. To support research in this direction, we make three main contributions. First, we present InteractTrack, the first large-scale benchmark for interactive tracking, containing 150 videos with dense bounding box annotations and timestamped language instructions. Second, we propose a comprehensive evaluation protocol and evaluate 25 representative trackers, showing that state-of-the-art methods fail in interactive scenarios; strong performance on conventional benchmarks does not transfer. Third, we introduce Interactive Memory-Augmented Tracking (IMAT), a new baseline that employs a dynamic memory mechanism to learn from user feedback and update tracking behavior accordingly. Our benchmark, protocol, and baseline establish a foundation for developing more intelligent, adaptive, and collaborative tracking systems, bridging the gap between automated perception and human guidance. The full benchmark, tracking results, and analysis are available at https://github.com/NorahGreen/InteractTrack.git.

Problem

Research questions and friction points this paper is trying to address.

interactive tracking

human-in-the-loop

visual tracking

adaptive systems

natural language guidance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Interactive Tracking

Human-in-the-Loop

Memory-Augmented Adaptation

Natural Language Guidance

Dynamic Memory Mechanism

🔎 Similar Papers

No similar papers found.