🤖 AI Summary
To address the contradiction between high energy consumption of graph convolutional networks (GCNs) and the difficulty of spiking neural networks (SNNs) in modeling skeletal temporal dynamics for skeleton-based action recognition, this paper proposes a low-power, high-accuracy multimodal spiking graph model. Methodologically, we first formulate skeletal motion as discrete stochastic spike signals in the time-frequency domain. We then introduce three novel modules: 1D spiking graph convolution, frequency-domain spiking convolution, and multi-scale wavelet feature fusion—enabling event-driven joint spatiotemporal–spectral modeling. Evaluated on NTU RGB+D, NTU-120, and NW-UCLA benchmarks, our approach achieves accuracy comparable to state-of-the-art GCNs while significantly outperforming existing SNN-based methods. Theoretically, it reduces energy consumption by several orders of magnitude and substantially decreases inference latency, thereby achieving an optimal trade-off between recognition accuracy and energy efficiency.
📝 Abstract
For skeleton-based action recognition, Graph Convolutional Networks (GCNs) are effective models. Still, their reliance on floating-point computations leads to high energy consumption, limiting their applicability in battery-powered devices. While energy-efficient, Spiking Neural Networks (SNNs) struggle to model skeleton dynamics, leading to suboptimal solutions. We propose Signal-SGN (Spiking Graph Convolutional Network), which utilizes the temporal dimension of skeleton sequences as the spike time steps and represents features as multi-dimensional discrete stochastic signals for temporal-frequency domain feature extraction. It combines the 1D Spiking Graph Convolution (1D-SGC) module and the Frequency Spiking Convolution (FSC) module to extract features from the skeleton represented as spiking form. Additionally, the Multi-Scale Wavelet Transform Feature Fusion (MWTF) module is proposed to extract dynamic spiking features and capture frequency-specific characteristics, enhancing classification performance. Experiments across three large-scale datasets reveal Signal-SGN exceeding state-of-the-art SNN-based methods in accuracy and computational efficiency while attaining comparable performance with GCN methods and significantly reducing theoretical energy consumption.