🤖 AI Summary
This work addresses the challenge that spiking neural networks (SNNs), due to their inherent low-pass temporal characteristics, suppress high-frequency dynamics and struggle to effectively model video action information. For the first time, the study identifies a temporal passband mismatch in SNNs when applied to video tasks and proposes a plug-and-play Passband Optimization (PBO) module. Requiring only two learnable parameters, PBO adaptively emphasizes task-relevant motion frequencies through lightweight consistency constraints and enhanced high-frequency motion signals. Notably, this approach achieves performance gains of over 10 percentage points on UCF101 without altering the underlying network architecture and demonstrates consistent improvements across multimodal action recognition and weakly supervised video anomaly detection tasks.
📝 Abstract
Spiking neural networks (SNNs) have gained traction in vision due to their energy efficiency, bio-plausibility, and inherent temporal processing. Yet, despite this temporal capacity, most progress concentrates on static image benchmarks, and SNNs still underperform on dynamic video tasks compared to artificial neural networks (ANNs). In this work, we diagnose a fundamental pass-band mismatch: Standard spiking dynamics behave as a temporal low pass that emphasizes static content while attenuating motion bearing bands, where task relevant information concentrates in dynamic tasks. This phenomenon explains why SNNs can approach ANNs on static tasks yet fall behind on tasks that demand richer temporal understanding.To remedy this, we propose the Pass-Bands Optimizer (PBO), a plug-and-play module that optimizes the temporal pass-band toward task-relevant motion bands. PBO introduces only two learnable parameters, and a lightweight consistency constraint that preserves semantics and boundaries, incurring negligible computational overhead and requires no architectural changes. PBO deliberately suppresses static components that contribute little to discrimination, effectively high passing the stream so that spiking activity concentrates on motion bearing content. On UCF101, PBO yields over ten percentage points improvement. On more complex multi-modal action recognition and weakly supervised video anomaly detection, PBO delivers consistent and significant gains, offering a new perspective for SNN based video processing and understanding.