Multi-State Tracker: Enhancing Efficient Object Tracking via Multi-State Specialization and Interaction

📅 2025-08-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Efficient trackers often compromise representational capacity by simplifying feature extraction, leading to insufficient modeling of target states with single-layer features. To address this, we propose the Multi-State Tracker (MST), which employs multi-stage feature extraction to generate complementary state representations. We introduce a lightweight State-Specific Enhancement (SSE) module to boost feature discriminability and design a Cross-State Interaction (CSI) module that enables efficient, adaptive feature fusion via a hidden-state-driven, dual-structured state space. MST significantly improves robustness in complex scenarios with minimal overhead: on GOT-10K, it achieves a 4.5% gain in Average Overlap (AO) over HCAT, while adding only 0.1 GFLOPs and 0.66M parameters—striking an optimal balance between high accuracy and real-time performance.

Technology Category

Application Category

📝 Abstract
Efficient trackers achieve faster runtime by reducing computational complexity and model parameters. However, this efficiency often compromises the expense of weakened feature representation capacity, thus limiting their ability to accurately capture target states using single-layer features. To overcome this limitation, we propose Multi-State Tracker (MST), which utilizes highly lightweight state-specific enhancement (SSE) to perform specialized enhancement on multi-state features produced by multi-state generation (MSG) and aggregates them in an interactive and adaptive manner using cross-state interaction (CSI). This design greatly enhances feature representation while incurring minimal computational overhead, leading to improved tracking robustness in complex environments. Specifically, the MSG generates multiple state representations at multiple stages during feature extraction, while SSE refines them to highlight target-specific features. The CSI module facilitates information exchange between these states and ensures the integration of complementary features. Notably, the introduced SSE and CSI modules adopt a highly lightweight hidden state adaptation-based state space duality (HSA-SSD) design, incurring only 0.1 GFLOPs in computation and 0.66 M in parameters. Experimental results demonstrate that MST outperforms all previous efficient trackers across multiple datasets, significantly improving tracking accuracy and robustness. In particular, it shows excellent runtime performance, with an AO score improvement of 4.5% over the previous SOTA efficient tracker HCAT on the GOT-10K dataset. The code is available at https://github.com/wsumel/MST.
Problem

Research questions and friction points this paper is trying to address.

Enhances object tracking with multi-state feature specialization
Improves tracking robustness in complex environments efficiently
Reduces computational overhead while boosting feature representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-State Generation for diverse feature extraction
State-Specific Enhancement to refine target features
Cross-State Interaction for adaptive feature fusion
🔎 Similar Papers
No similar papers found.
Shilei Wang
Shilei Wang
西北工业大学
Gong Cheng
Gong Cheng
Professor, Nanjing University
big data searchknowledge graphLLM inference
Pujian Lai
Pujian Lai
Northwestern Polytechnical University
object tracking、 Interpretable Machine Learning
D
Dong Gao
School of Automation, Northwestern Polytechnical University, Xi’an, Shaanxi, China
J
Junwei Han
School of Automation, Northwestern Polytechnical University, Xi’an, Shaanxi, China