Exploring Dynamic Transformer for Efficient Object Tracking

📅 2024-03-26

🏛️ IEEE Transactions on Neural Networks and Learning Systems

📈 Citations: 2

✨ Influential: 0

career value

200K/year

🤖 AI Summary

To address the fundamental trade-off between speed and accuracy in visual object tracking under resource-constrained conditions, this paper proposes DyTrack, a dynamic Transformer framework. Methodologically, DyTrack introduces the first dynamic inference path mechanism tailored for tracking, integrating early-exit branches at intermediate layers, cross-layer feature reuse, and target-aware self-distillation to enable frame-wise, complexity-adaptive computation allocation within a single model. It combines dynamic network routing with sequential decision modeling, implemented via a lightweight Transformer architecture. On the LaSOT benchmark, DyTrack achieves 64.9% AUC while running at 256 FPS—significantly outperforming existing methods operating at comparable speeds—and establishes the new state-of-the-art in speed-accuracy trade-off for real-time tracking.

Technology Category

Application Category

📝 Abstract

The speed-precision tradeoff is a critical problem in visual object tracking, as it typically requires low latency and is deployed on resource-constrained platforms. Existing solutions for efficient tracking primarily focus on lightweight backbones or modules, which, however, come at a sacrifice in precision. In this article, inspired by dynamic network routing, we propose DyTrack, a dynamic transformer framework for efficient tracking. Real-world tracking scenarios exhibit varying levels of complexity. We argue that a simple network is sufficient for easy video frames, while more computational resources should be assigned to difficult ones. DyTrack automatically learns to configure proper reasoning routes for different inputs, thereby improving the utilization of the available computational budget and achieving higher performance at the same running speed. We formulate instance-specific tracking as a sequential decision problem and incorporate terminating branches to intermediate layers of the model. Furthermore, we propose a feature recycling mechanism to maximize computational efficiency by reusing the outputs of predecessors. Additionally, a target-aware self-distillation strategy is designed to enhance the discriminating capabilities of early-stage predictions by mimicking the representation patterns of the deep model. Extensive experiments demonstrate that DyTrack achieves promising speed-precision tradeoffs with only a single model. For instance, DyTrack obtains 64.9% area under the curve (AUC) on LaSOT with a speed of 256fps.

Problem

Research questions and friction points this paper is trying to address.

Balancing speed and precision in object tracking

Adapting computation for varying tracking complexity

Enhancing efficiency without sacrificing tracking accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic transformer framework for efficient tracking

Feature recycling mechanism to reuse outputs

Target-aware self-distillation strategy for early predictions

🔎 Similar Papers

No similar papers found.