Unleashing Temporal Capacity of Spiking Neural Networks through Spatiotemporal Separation

πŸ“… 2025-12-05
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing studies struggle to disentangle the functional role of membrane potential propagation in complex temporal modeling within spiking neural networks (SNNs), while tight coupling between spatial semantics and temporal dynamics induces spatiotemporal resource competition. To address this, we propose STSepβ€”a novel decoupled architecture that, for the first time in SNNs, separates spatial and temporal processing branches. Through state-free membrane potential analysis, we discover that moderate suppression of membrane potential propagation enhances performance, revealing an intrinsic trade-off in spatiotemporal encoding. We further design a spatial-temporal separable residual block that integrates explicit temporal differencing with spike-based attention for efficient dynamic modeling. STSep achieves state-of-the-art accuracy on Something-Something V2, UCF101, and HMDB51. Retrieval experiments and attention visualization confirm its strong focus on motion-related features, significantly outperforming static appearance-based approaches.

Technology Category

Application Category

πŸ“ Abstract
Spiking Neural Networks (SNNs) are considered naturally suited for temporal processing, with membrane potential propagation widely regarded as the core temporal modeling mechanism. However, existing research lack analysis of its actual contributions in complex temporal tasks. We design Non-Stateful (NS) models progressively removing membrane propagation to quantify its stage-wise role. Experiments reveal a counterintuitive phenomenon: moderate removal in shallow or deep layers improves performance, while excessive removal causes collapse. We attribute this to spatio-temporal resource competition where neurons encode both semantics and dynamics within limited range, with temporal state consuming capacity for spatial learning. Based on this, we propose Spatial-Temporal Separable Network (STSep), decoupling residual blocks into independent spatial and temporal branches. The spatial branch focuses on semantic extraction while the temporal branch captures motion through explicit temporal differences. Experiments on Something-Something V2, UCF101, and HMDB51 show STSep achieves superior performance, with retrieval task and attention analysis confirming focus on motion rather than static appearance. This work provides new perspectives on SNNs' temporal mechanisms and an effective solution for spatiotemporal modeling in video understanding.
Problem

Research questions and friction points this paper is trying to address.

Quantify membrane propagation's role in SNNs' temporal tasks
Address spatio-temporal resource competition in neural encoding
Decouple spatial and temporal branches for video understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decoupling spatial and temporal branches in networks
Using explicit temporal differences for motion capture
Separating semantic extraction from dynamic modeling
πŸ”Ž Similar Papers
No similar papers found.