Towards Low-Latency Event Stream-based Visual Object Tracking: A Slow-Fast Approach

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the performance bottlenecks of low-frame-rate RGB camera-based trackers in latency-critical and resource-constrained scenarios, this paper proposes SFTrack, an event-camera-driven slow-fast dual-modal tracking framework. The slow tracker ensures high accuracy, while the fast tracker employs a lightweight graph neural network backbone, FlashAttention, and a single-forward multi-box detection head to achieve millisecond-level latency. We introduce, for the first time, an event-stream graph representation learning scheme and a dual-FlashAttention collaborative architecture, enabling seamless trade-off between accuracy and latency. The fast tracker is jointly optimized via supervised fine-tuning and knowledge distillation from the slow tracker. Evaluated on FE240, COESOT, and EventVOT benchmarks, the fast tracker reduces latency by 62% while maintaining accuracy close to that of the slow tracker, significantly improving the real-time accuracy–latency trade-off. The code is publicly available.

Technology Category

Application Category

📝 Abstract
Existing tracking algorithms typically rely on low-frame-rate RGB cameras coupled with computationally intensive deep neural network architectures to achieve effective tracking. However, such frame-based methods inherently face challenges in achieving low-latency performance and often fail in resource-constrained environments. Visual object tracking using bio-inspired event cameras has emerged as a promising research direction in recent years, offering distinct advantages for low-latency applications. In this paper, we propose a novel Slow-Fast Tracking paradigm that flexibly adapts to different operational requirements, termed SFTrack. The proposed framework supports two complementary modes, i.e., a high-precision slow tracker for scenarios with sufficient computational resources, and an efficient fast tracker tailored for latency-aware, resource-constrained environments. Specifically, our framework first performs graph-based representation learning from high-temporal-resolution event streams, and then integrates the learned graph-structured information into two FlashAttention-based vision backbones, yielding the slow and fast trackers, respectively. The fast tracker achieves low latency through a lightweight network design and by producing multiple bounding box outputs in a single forward pass. Finally, we seamlessly combine both trackers via supervised fine-tuning and further enhance the fast tracker's performance through a knowledge distillation strategy. Extensive experiments on public benchmarks, including FE240, COESOT, and EventVOT, demonstrate the effectiveness and efficiency of our proposed method across different real-world scenarios. The source code has been released on https://github.com/Event-AHU/SlowFast_Event_Track.
Problem

Research questions and friction points this paper is trying to address.

Achieving low-latency visual object tracking with event cameras
Overcoming resource constraints in frame-based tracking methods
Adapting tracking precision to computational resource availability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Slow-Fast Tracking paradigm for flexible adaptation
Graph-based representation learning from event streams
Lightweight network design for low latency
🔎 Similar Papers
No similar papers found.
Shiao Wang
Shiao Wang
安徽大学
Deep Learning
X
Xiao Wang
School of Computer Science and Technology, Anhui University, Hefei, China
L
Liye Jin
School of Computer Science and Technology, Anhui University, Hefei, China
B
Bo Jiang
School of Computer Science and Technology, Anhui University, Hefei, China
L
Lin Zhu
Beijing Institute of Technology, Beijing, China
Lan Chen
Lan Chen
Communication University of China
Image/Video generation and editing
Y
Yonghong Tian
Peng Cheng Laboratory, Shenzhen, China; School of Computer Science, Peking University, China; School of Electronic and Computer Engineering, Shenzhen Graduate School, Peking University, China
B
Bin Luo
School of Computer Science and Technology, Anhui University, Hefei, China