Two-stream Beats One-stream: Asymmetric Siamese Network for Efficient Visual Tracking

📅 2025-03-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address template redundancy and excessive semantic interaction overhead in single-stream Siamese trackers deployed on resource-constrained edge devices, this paper proposes AsymTrack—a heterogeneous dual-stream tracker. Methodologically, it introduces: (1) a novel heterogeneous dual-stream architecture that decouples the template and search branches; (2) a unidirectional template modulation mechanism, generating only a lightweight modulation signal during initialization for efficient feature enhancement; and (3) a perception-enhancement module integrating abstract semantics and local details. Evaluated on LaSOT, AsymTrack-T achieves 60.8% AUC, with inference speeds of 224 FPS (GPU), 81 FPS (CPU), and 84 FPS (Jetson AGX). Its AUC surpasses HiT-Tiny by 6.0%, significantly improving the accuracy–speed trade-off for edge deployment.

Technology Category

Application Category

📝 Abstract
Efficient tracking has garnered attention for its ability to operate on resource-constrained platforms for real-world deployment beyond desktop GPUs. Current efficient trackers mainly follow precision-oriented trackers, adopting a one-stream framework with lightweight modules. However, blindly adhering to the one-stream paradigm may not be optimal, as incorporating template computation in every frame leads to redundancy, and pervasive semantic interaction between template and search region places stress on edge devices. In this work, we propose a novel asymmetric Siamese tracker named extbf{AsymTrack} for efficient tracking. AsymTrack disentangles template and search streams into separate branches, with template computing only once during initialization to generate modulation signals. Building on this architecture, we devise an efficient template modulation mechanism to unidirectional inject crucial cues into the search features, and design an object perception enhancement module that integrates abstract semantics and local details to overcome the limited representation in lightweight tracker. Extensive experiments demonstrate that AsymTrack offers superior speed-precision trade-offs across different platforms compared to the current state-of-the-arts. For instance, AsymTrack-T achieves 60.8% AUC on LaSOT and 224/81/84 FPS on GPU/CPU/AGX, surpassing HiT-Tiny by 6.0% AUC with higher speeds. The code is available at https://github.com/jiawen-zhu/AsymTrack.
Problem

Research questions and friction points this paper is trying to address.

Optimizes visual tracking for resource-constrained platforms.
Reduces redundancy by separating template and search streams.
Enhances object perception with efficient template modulation.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Asymmetric Siamese network for visual tracking
Template modulation mechanism for efficiency
Object perception enhancement module integration
🔎 Similar Papers
No similar papers found.
Jiawen Zhu
Jiawen Zhu
Dalian University of Technology
computer visionobject trackingmulti-modal learning
H
Huayi Tang
University of Pennsylvania, Philadelphia, USA
X
Xin Chen
Dalian University of Technology, Dalian, China
X
Xinying Wang
Dalian University of Technology, Dalian, China
D
Dong Wang
Dalian University of Technology, Dalian, China
H
Huchuan Lu
Dalian University of Technology, Dalian, China