ISTASTrack: Bridging ANN and SNN via ISTA Adapter for RGB-Event Tracking

📅 2025-09-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses key challenges in RGB-event heterogeneous tracking—namely, event stream sparsity and asynchrony, as well as the modality and computational paradigm mismatch between artificial neural networks (ANNs) and spiking neural networks (SNNs). To this end, we propose ISTASTrack, the first Transformer-based hybrid tracker. Its core contributions are: (1) an Iterative Shrinkage-Thresholding Algorithm (ISTA)-inspired adapter grounded in sparse representation theory, enabling bidirectional feature interaction between ANN and SNN branches; (2) a temporal downsampling attention module that aligns heterogeneous temporal features across RGB frames and event streams; and (3) an end-to-end trainable architecture jointly leveraging vision Transformers (for spatial context modeling) and spiking Transformers (for spatiotemporal dynamics modeling). ISTASTrack achieves state-of-the-art performance on FE240Hz, VisEvent, COESOT, and FELT benchmarks, demonstrating both high accuracy and energy efficiency—thereby validating the efficacy of hybrid neural architectures for robust visual tracking.

Technology Category

Application Category

📝 Abstract
RGB-Event tracking has become a promising trend in visual object tracking to leverage the complementary strengths of both RGB images and dynamic spike events for improved performance. However, existing artificial neural networks (ANNs) struggle to fully exploit the sparse and asynchronous nature of event streams. Recent efforts toward hybrid architectures combining ANNs and spiking neural networks (SNNs) have emerged as a promising solution in RGB-Event perception, yet effectively fusing features across heterogeneous paradigms remains a challenge. In this work, we propose ISTASTrack, the first transformer-based extbf{A}NN- extbf{S}NN hybrid extbf{Track}er equipped with extbf{ISTA} adapters for RGB-Event tracking. The two-branch model employs a vision transformer to extract spatial context from RGB inputs and a spiking transformer to capture spatio-temporal dynamics from event streams. To bridge the modality and paradigm gap between ANN and SNN features, we systematically design a model-based ISTA adapter for bidirectional feature interaction between the two branches, derived from sparse representation theory by unfolding the iterative shrinkage thresholding algorithm. Additionally, we incorporate a temporal downsampling attention module within the adapter to align multi-step SNN features with single-step ANN features in the latent space, improving temporal fusion. Experimental results on RGB-Event tracking benchmarks, such as FE240hz, VisEvent, COESOT, and FELT, have demonstrated that ISTASTrack achieves state-of-the-art performance while maintaining high energy efficiency, highlighting the effectiveness and practicality of hybrid ANN-SNN designs for robust visual tracking. The code is publicly available at https://github.com/lsying009/ISTASTrack.git.
Problem

Research questions and friction points this paper is trying to address.

Bridging ANN and SNN feature fusion for RGB-Event tracking
Exploiting sparse asynchronous event streams in visual tracking
Aligning heterogeneous temporal features between neural paradigms
Innovation

Methods, ideas, or system contributions that make the work stand out.

ISTA adapters bridge ANN and SNN features
Transformer extracts spatial and spatio-temporal dynamics
Temporal downsamning aligns multi-step SNN features
🔎 Similar Papers
No similar papers found.
S
Siying Liu
Center for Brain Inspired Computing Research (CBICR), Department of Precision Instrument, Tsinghua University, Beijing, China
Z
Zikai Wang
College of Computer Science and Technology, Taiyuan University of Technology, Shanxi, China
Hanle Zheng
Hanle Zheng
Department of Precision Instrument, Tsinghua University
bio-inspired machine learning、deep learning
Y
Yifan Hu
Center for Brain Inspired Computing Research (CBICR), Department of Precision Instrument, Tsinghua University, Beijing, China
X
Xilin Wang
Engineering Laboratory of Power Equipment Reliability in Complicated Coastal Environments, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
Qingkai Yang
Qingkai Yang
School of Automation, Beijing Institute of Technology, Beijing, China
Jibin Wu
Jibin Wu
The Hong Kong Polytechnic University
Spiking Neural NetworkNeuromorphic ComputingSpeech ProcessingCognitive Modelling
H
Hao Guo
College of Computer Science and Technology, Taiyuan University of Technology, Shanxi, China
L
Lei Deng
Center for Brain Inspired Computing Research (CBICR), Department of Precision Instrument, Tsinghua University, Beijing, China