Interactive Tracking: A Human-in-the-Loop Paradigm with Memory-Augmented Adaptation

📅 2026-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a novel paradigm for interactive visual tracking that enables users to dynamically guide the tracking process through natural language instructions, addressing the limitations of existing non-interactive trackers in scenarios requiring real-time human intervention. To facilitate research in this direction, the authors introduce InteractTrack, the first large-scale benchmark for interactive tracking, comprising 150 videos densely annotated with temporally aligned language instructions. They further present IMAT, a baseline model incorporating a dynamic memory mechanism to effectively fuse linguistic guidance with visual cues and support online adaptive updates. Experimental results demonstrate that state-of-the-art trackers substantially degrade under interactive settings, whereas IMAT exhibits robust human-in-the-loop tracking performance on the new benchmark, thereby advancing the field of interactive perception.
📝 Abstract
Existing visual trackers mainly operate in a non-interactive, fire-and-forget manner, making them impractical for real-world scenarios that require human-in-the-loop adaptation. To overcome this limitation, we introduce Interactive Tracking, a new paradigm that allows users to guide the tracker at any time using natural language commands. To support research in this direction, we make three main contributions. First, we present InteractTrack, the first large-scale benchmark for interactive tracking, containing 150 videos with dense bounding box annotations and timestamped language instructions. Second, we propose a comprehensive evaluation protocol and evaluate 25 representative trackers, showing that state-of-the-art methods fail in interactive scenarios; strong performance on conventional benchmarks does not transfer. Third, we introduce Interactive Memory-Augmented Tracking (IMAT), a new baseline that employs a dynamic memory mechanism to learn from user feedback and update tracking behavior accordingly. Our benchmark, protocol, and baseline establish a foundation for developing more intelligent, adaptive, and collaborative tracking systems, bridging the gap between automated perception and human guidance. The full benchmark, tracking results, and analysis are available at https://github.com/NorahGreen/InteractTrack.git.
Problem

Research questions and friction points this paper is trying to address.

interactive tracking
human-in-the-loop
visual tracking
adaptive systems
natural language guidance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Interactive Tracking
Human-in-the-Loop
Memory-Augmented Adaptation
Natural Language Guidance
Dynamic Memory Mechanism
🔎 Similar Papers
No similar papers found.