Dynamic Pondering Sparsity-aware Mixture-of-Experts Transformer for Event Stream based Visual Object Tracking

📅 2026-05-07
📈 Citations: 0
Influential: 0
📄 PDF

career value

200K/year
🤖 AI Summary
Existing event-based object tracking methods overlook the spatial sparsity and dynamically varying temporal density of event data, and their reliance on fixed-time-window sampling limits adaptability in complex motion scenarios. This work proposes a sparsity-aware hierarchical tracking framework that employs a three-stage Vision Transformer to separately process low-, medium-, and high-density event regions. The approach integrates a sparsity-aware Mixture-of-Experts (MoE) module and a dynamic pondering mechanism to enable adaptive feature learning across varying event densities. To our knowledge, this is the first method in event-based tracking to jointly incorporate multi-scale event density modeling, sparsity-aware MoE, and dynamic computation path selection based on input sparsity and task difficulty. It achieves state-of-the-art performance on FE240Hz, COESOT, and EventVOT benchmarks, demonstrating superior accuracy and efficiency.
📝 Abstract
Despite significant progress, RGB-based trackers remain vulnerable to challenging imaging conditions, such as low illumination and fast motion. Event cameras offer a promising alternative by asynchronously capturing pixel-wise brightness changes, providing high dynamic range and high temporal resolution. However, existing event-based trackers often neglect the intrinsic spatial sparsity and temporal density of event data, while relying on a single fixed temporal-window sampling strategy that is suboptimal under varying motion dynamics. In this paper, we propose an event sparsity-aware tracking framework that explicitly models event-density variations across multiple temporal scales. Specifically, the proposed framework progressively injects sparse, medium-density, and dense event search regions into a three-stage Vision Transformer backbone, enabling hierarchical multi-density feature learning. Furthermore, we introduce a sparsity-aware Mixture-of-Experts module to encourage expert specialization under different sparsity patterns, and design a dynamic pondering strategy to adaptively adjust the inference depth according to tracking difficulty. Extensive experiments on FE240hz, COESOT, and EventVOT demonstrate that the proposed approach achieves a favorable trade-off between tracking accuracy and computational efficiency. The source code will be released on https://github.com/Event-AHU/OpenEvTracking.
Problem

Research questions and friction points this paper is trying to address.

event-based tracking
spatial sparsity
temporal density
motion dynamics
temporal-window sampling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparsity-aware
Mixture-of-Experts
Dynamic Pondering
Event-based Tracking
Multi-density Feature Learning
🔎 Similar Papers
No similar papers found.
Shiao Wang
Shiao Wang
安徽大学
Deep Learning
X
Xiao Wang
School of Computer Science and Technology, Anhui University, Hefei 230601, China
D
Duoqing Yang
School of Computer Science and Technology, Anhui University, Hefei 230601, China
W
Wenhao Zhang
School of Computer Science and Technology, Anhui University, Hefei 230601, China
Bo Jiang
Bo Jiang
Anhui University
Computer Vision and Pattern Recognition
Lin Zhu
Lin Zhu
Assistant Professor, School of Computer Science & Techonology, Beijing Institute of Technology
Neuromorphic visionVideo processingEvent-based visionSpiking neural network
Y
Yonghong Tian
Peng Cheng Laboratory, Shenzhen, China; National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University, China; School of Electronic and Computer Engineering, Shenzhen Graduate School, Peking University, China
Bin Luo
Bin Luo
Anhui University, University of York
Pattern recognitionDigital image processing