Signal-SGN: A Spiking Graph Convolutional Network for Skeletal Action Recognition via Learning Temporal-Frequency Dynamics

📅 2024-08-03
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the contradiction between high energy consumption of graph convolutional networks (GCNs) and the difficulty of spiking neural networks (SNNs) in modeling skeletal temporal dynamics for skeleton-based action recognition, this paper proposes a low-power, high-accuracy multimodal spiking graph model. Methodologically, we first formulate skeletal motion as discrete stochastic spike signals in the time-frequency domain. We then introduce three novel modules: 1D spiking graph convolution, frequency-domain spiking convolution, and multi-scale wavelet feature fusion—enabling event-driven joint spatiotemporal–spectral modeling. Evaluated on NTU RGB+D, NTU-120, and NW-UCLA benchmarks, our approach achieves accuracy comparable to state-of-the-art GCNs while significantly outperforming existing SNN-based methods. Theoretically, it reduces energy consumption by several orders of magnitude and substantially decreases inference latency, thereby achieving an optimal trade-off between recognition accuracy and energy efficiency.

Technology Category

Application Category

📝 Abstract
For skeleton-based action recognition, Graph Convolutional Networks (GCNs) are effective models. Still, their reliance on floating-point computations leads to high energy consumption, limiting their applicability in battery-powered devices. While energy-efficient, Spiking Neural Networks (SNNs) struggle to model skeleton dynamics, leading to suboptimal solutions. We propose Signal-SGN (Spiking Graph Convolutional Network), which utilizes the temporal dimension of skeleton sequences as the spike time steps and represents features as multi-dimensional discrete stochastic signals for temporal-frequency domain feature extraction. It combines the 1D Spiking Graph Convolution (1D-SGC) module and the Frequency Spiking Convolution (FSC) module to extract features from the skeleton represented as spiking form. Additionally, the Multi-Scale Wavelet Transform Feature Fusion (MWTF) module is proposed to extract dynamic spiking features and capture frequency-specific characteristics, enhancing classification performance. Experiments across three large-scale datasets reveal Signal-SGN exceeding state-of-the-art SNN-based methods in accuracy and computational efficiency while attaining comparable performance with GCN methods and significantly reducing theoretical energy consumption.
Problem

Research questions and friction points this paper is trying to address.

High energy consumption in GCNs for skeleton action recognition
SNNs' inability to effectively model skeleton dynamics
Need for efficient temporal-frequency feature extraction in spiking networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses spiking form for skeleton feature extraction
Combines 1D-SGC and FSC modules
Employs MWTF for dynamic spiking features
🔎 Similar Papers
No similar papers found.
N
Naichuan Zheng
Beijing Laboratory of Advanced Information Networks, Beijing Key Laboratory of Network System Architecture and Convergence, School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing, China, 100876
H
Hailun Xia
Beijing Laboratory of Advanced Information Networks, Beijing Key Laboratory of Network System Architecture and Convergence, School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing, China, 100876
D
Dapeng Liu
Beijing Laboratory of Advanced Information Networks, Beijing Key Laboratory of Network System Architecture and Convergence, School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing, China, 100876