SMTrack: End-to-End Trained Spiking Neural Networks for Multi-Object Tracking in RGB Videos

📅 2025-08-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited applicability of Spiking Neural Networks (SNNs) to standard RGB-video Multi-Object Tracking (MOT), proposing the first end-to-end trainable deep SNN-based MOT framework. Methodologically: (1) we introduce an Adaptive Scale-Aware Normalized Wasserstein Distance Loss (Asa-NWDLoss) to enhance detection sensitivity for small objects; (2) we design a lightweight TrackTrack identity association module to improve trajectory consistency. Evaluated on BEE24, MOT17, MOT20, and DanceTrack, our approach achieves accuracy comparable to state-of-the-art Artificial Neural Network (ANN)-based methods while inheriting the intrinsic energy efficiency of SNNs. This work overcomes a key modeling bottleneck for SNNs in complex temporal vision tasks and establishes a new paradigm for low-power, real-time MOT.

Technology Category

Application Category

📝 Abstract
Brain-inspired Spiking Neural Networks (SNNs) exhibit significant potential for low-power computation, yet their application in visual tasks remains largely confined to image classification, object detection, and event-based tracking. In contrast, real-world vision systems still widely use conventional RGB video streams, where the potential of directly-trained SNNs for complex temporal tasks such as multi-object tracking (MOT) remains underexplored. To address this challenge, we propose SMTrack-the first directly trained deep SNN framework for end-to-end multi-object tracking on standard RGB videos. SMTrack introduces an adaptive and scale-aware Normalized Wasserstein Distance loss (Asa-NWDLoss) to improve detection and localization performance under varying object scales and densities. Specifically, the method computes the average object size within each training batch and dynamically adjusts the normalization factor, thereby enhancing sensitivity to small objects. For the association stage, we incorporate the TrackTrack identity module to maintain robust and consistent object trajectories. Extensive evaluations on BEE24, MOT17, MOT20, and DanceTrack show that SMTrack achieves performance on par with leading ANN-based MOT methods, advancing robust and accurate SNN-based tracking in complex scenarios.
Problem

Research questions and friction points this paper is trying to address.

Enabling SNNs for multi-object tracking in standard RGB videos
Addressing varying object scales and densities in detection
Maintaining consistent object identities across video frames
Innovation

Methods, ideas, or system contributions that make the work stand out.

End-to-end trained SNN for RGB video tracking
Adaptive scale-aware loss for object detection
Identity module for consistent object trajectories
🔎 Similar Papers
No similar papers found.
P
Pengzhi Zhong
College of Computer Science and Engineering, Guilin University of Technology, China, 541006 and Guangxi Key Laboratory of Embedded Technology and Intelligent System, Guilin University of Technology, China, 541004
X
Xinzhe Wang
College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
Dan Zeng
Dan Zeng
Sun Yat-sen University
Biometricscomputer visiondeep learning
Qihua Zhou
Qihua Zhou
Shenzhen University
Edge AI SystemsTiny Machine LearningOn-Device LearningDistributed Machine Learning
Feixiang He
Feixiang He
Central South University
Crowd Behaviour UnderstandingComputer VisionComputer GraphicsDeep Learning
Shuiwang Li
Shuiwang Li
Guilin University of Technology