🤖 AI Summary
Efficient trackers often compromise representational capacity by simplifying feature extraction, leading to insufficient modeling of target states with single-layer features. To address this, we propose the Multi-State Tracker (MST), which employs multi-stage feature extraction to generate complementary state representations. We introduce a lightweight State-Specific Enhancement (SSE) module to boost feature discriminability and design a Cross-State Interaction (CSI) module that enables efficient, adaptive feature fusion via a hidden-state-driven, dual-structured state space. MST significantly improves robustness in complex scenarios with minimal overhead: on GOT-10K, it achieves a 4.5% gain in Average Overlap (AO) over HCAT, while adding only 0.1 GFLOPs and 0.66M parameters—striking an optimal balance between high accuracy and real-time performance.
📝 Abstract
Efficient trackers achieve faster runtime by reducing computational complexity and model parameters. However, this efficiency often compromises the expense of weakened feature representation capacity, thus limiting their ability to accurately capture target states using single-layer features. To overcome this limitation, we propose Multi-State Tracker (MST), which utilizes highly lightweight state-specific enhancement (SSE) to perform specialized enhancement on multi-state features produced by multi-state generation (MSG) and aggregates them in an interactive and adaptive manner using cross-state interaction (CSI). This design greatly enhances feature representation while incurring minimal computational overhead, leading to improved tracking robustness in complex environments. Specifically, the MSG generates multiple state representations at multiple stages during feature extraction, while SSE refines them to highlight target-specific features. The CSI module facilitates information exchange between these states and ensures the integration of complementary features. Notably, the introduced SSE and CSI modules adopt a highly lightweight hidden state adaptation-based state space duality (HSA-SSD) design, incurring only 0.1 GFLOPs in computation and 0.66 M in parameters. Experimental results demonstrate that MST outperforms all previous efficient trackers across multiple datasets, significantly improving tracking accuracy and robustness. In particular, it shows excellent runtime performance, with an AO score improvement of 4.5% over the previous SOTA efficient tracker HCAT on the GOT-10K dataset. The code is available at https://github.com/wsumel/MST.