Signal-SGN: A Spiking Graph Convolutional Network for Skeletal Action Recognition via Learning Temporal-Frequency Dynamics

📅 2024-08-03

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

243K/year

🤖 AI Summary

To address the contradiction between high energy consumption of graph convolutional networks (GCNs) and the difficulty of spiking neural networks (SNNs) in modeling skeletal temporal dynamics for skeleton-based action recognition, this paper proposes a low-power, high-accuracy multimodal spiking graph model. Methodologically, we first formulate skeletal motion as discrete stochastic spike signals in the time-frequency domain. We then introduce three novel modules: 1D spiking graph convolution, frequency-domain spiking convolution, and multi-scale wavelet feature fusion—enabling event-driven joint spatiotemporal–spectral modeling. Evaluated on NTU RGB+D, NTU-120, and NW-UCLA benchmarks, our approach achieves accuracy comparable to state-of-the-art GCNs while significantly outperforming existing SNN-based methods. Theoretically, it reduces energy consumption by several orders of magnitude and substantially decreases inference latency, thereby achieving an optimal trade-off between recognition accuracy and energy efficiency.

Technology Category

Application Category

📝 Abstract

For skeleton-based action recognition, Graph Convolutional Networks (GCNs) are effective models. Still, their reliance on floating-point computations leads to high energy consumption, limiting their applicability in battery-powered devices. While energy-efficient, Spiking Neural Networks (SNNs) struggle to model skeleton dynamics, leading to suboptimal solutions. We propose Signal-SGN (Spiking Graph Convolutional Network), which utilizes the temporal dimension of skeleton sequences as the spike time steps and represents features as multi-dimensional discrete stochastic signals for temporal-frequency domain feature extraction. It combines the 1D Spiking Graph Convolution (1D-SGC) module and the Frequency Spiking Convolution (FSC) module to extract features from the skeleton represented as spiking form. Additionally, the Multi-Scale Wavelet Transform Feature Fusion (MWTF) module is proposed to extract dynamic spiking features and capture frequency-specific characteristics, enhancing classification performance. Experiments across three large-scale datasets reveal Signal-SGN exceeding state-of-the-art SNN-based methods in accuracy and computational efficiency while attaining comparable performance with GCN methods and significantly reducing theoretical energy consumption.

Problem

Research questions and friction points this paper is trying to address.

High energy consumption in GCNs for skeleton action recognition

SNNs' inability to effectively model skeleton dynamics

Need for efficient temporal-frequency feature extraction in spiking networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses spiking form for skeleton feature extraction

Combines 1D-SGC and FSC modules

Employs MWTF for dynamic spiking features

🔎 Similar Papers

MK-SGN: A Spiking Graph Convolutional Network with Multimodal Fusion and Knowledge Distillation for Skeleton-based Action Recognition