Fire on Motion: Optimizing Video Pass-bands for Efficient Spiking Action Recognition

📅 2026-01-30

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This work addresses the challenge that spiking neural networks (SNNs), due to their inherent low-pass temporal characteristics, suppress high-frequency dynamics and struggle to effectively model video action information. For the first time, the study identifies a temporal passband mismatch in SNNs when applied to video tasks and proposes a plug-and-play Passband Optimization (PBO) module. Requiring only two learnable parameters, PBO adaptively emphasizes task-relevant motion frequencies through lightweight consistency constraints and enhanced high-frequency motion signals. Notably, this approach achieves performance gains of over 10 percentage points on UCF101 without altering the underlying network architecture and demonstrates consistent improvements across multimodal action recognition and weakly supervised video anomaly detection tasks.

Technology Category

Application Category

📝 Abstract

Spiking neural networks (SNNs) have gained traction in vision due to their energy efficiency, bio-plausibility, and inherent temporal processing. Yet, despite this temporal capacity, most progress concentrates on static image benchmarks, and SNNs still underperform on dynamic video tasks compared to artificial neural networks (ANNs). In this work, we diagnose a fundamental pass-band mismatch: Standard spiking dynamics behave as a temporal low pass that emphasizes static content while attenuating motion bearing bands, where task relevant information concentrates in dynamic tasks. This phenomenon explains why SNNs can approach ANNs on static tasks yet fall behind on tasks that demand richer temporal understanding.To remedy this, we propose the Pass-Bands Optimizer (PBO), a plug-and-play module that optimizes the temporal pass-band toward task-relevant motion bands. PBO introduces only two learnable parameters, and a lightweight consistency constraint that preserves semantics and boundaries, incurring negligible computational overhead and requires no architectural changes. PBO deliberately suppresses static components that contribute little to discrimination, effectively high passing the stream so that spiking activity concentrates on motion bearing content. On UCF101, PBO yields over ten percentage points improvement. On more complex multi-modal action recognition and weakly supervised video anomaly detection, PBO delivers consistent and significant gains, offering a new perspective for SNN based video processing and understanding.

Problem

Research questions and friction points this paper is trying to address.

Spiking Neural Networks

Temporal Processing

Pass-band Mismatch

Action Recognition

Video Understanding

Innovation

Methods, ideas, or system contributions that make the work stand out.

Spiking Neural Networks

Temporal Pass-band Optimization

Motion-aware Video Recognition