MK-SGN: A Spiking Graph Convolutional Network with Multimodal Fusion and Knowledge Distillation for Skeleton-based Action Recognition

📅 2024-04-16
🏛️ Neurocomputing
📈 Citations: 2
Influential: 0
📄 PDF

career value

242K/year
🤖 AI Summary
To address the challenge of skeleton-based action recognition under stringent power constraints on edge devices, this paper proposes the Spiking Graph Convolutional Network (S-GCN), the first framework to deeply integrate Spiking Neural Networks (SNNs) with Graph Convolutional Networks (GCNs) for multimodal skeleton data fusion. Key contributions include: (1) a Spiking Multimodal Fusion (SMF) module; (2) a joint modeling architecture combining Self-Attention Spiking Graph Convolution (SA-SGC) and Spiking Temporal Convolution (STC); and (3) a collaborative knowledge distillation strategy leveraging intermediate-layer features and soft labels. Experiments demonstrate that S-GCN reduces energy consumption by over 98% compared to floating-point GCN baselines, while achieving significantly higher accuracy than state-of-the-art SNN methods and mainstream GCN frameworks. This work establishes a new paradigm for high-accuracy, ultra-low-power skeleton action recognition on resource-constrained edge devices.

Technology Category

Application Category

📝 Abstract
In recent years, multimodal Graph Convolutional Networks (GCNs) have achieved remarkable performance in skeleton-based action recognition. The reliance on high-energy-consuming continuous floating-point operations inherent in GCN-based methods poses significant challenges for deployment in energy-constrained, battery-powered edge devices. To address these limitations, MK-SGN, a Spiking Graph Convolutional Network with Multimodal Fusion and Knowledge Distillation, is proposed to leverage the energy efficiency of Spiking Neural Networks (SNNs) for skeleton-based action recognition for the first time. By integrating the energy-saving properties of SNNs with the graph representation capabilities of GCNs, MK-SGN achieves significant reductions in energy consumption while maintaining competitive recognition accuracy. Firstly, we formulate a Spiking Multimodal Fusion (SMF) module to effectively fuse multimodal skeleton data represented as spike-form features. Secondly, we propose the Self-Attention Spiking Graph Convolution (SA-SGC) module and the Spiking Temporal Convolution (STC) module, to capture spatial relationships and temporal dynamics of spike-form features. Finally, we propose an integrated knowledge distillation strategy to transfer information from the multimodal GCN to the SGN, incorporating both intermediate-layer distillation and soft-label distillation to enhance the performance of the SGN. MK-SGN exhibits substantial advantages, surpassing state-of-the-art GCN frameworks in energy efficiency and outperforming state-of-the-art SNN frameworks in recognition accuracy. The proposed method achieves a remarkable reduction in energy consumption, exceeding 98% compared to conventional GCN-based approaches. This research establishes a robust baseline for developing high-performance, energy-efficient SNN-based models for skeleton-based action recognition
Problem

Research questions and friction points this paper is trying to address.

Reducing energy consumption in skeleton-based action recognition models
Maintaining recognition accuracy while using energy-efficient spiking neural networks
Fusing multimodal skeleton data and transferring knowledge from GCNs to SNNs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Spiking Neural Networks for energy-efficient action recognition
Integrates multimodal fusion and knowledge distillation techniques
Proposes novel spiking modules for spatial-temporal feature extraction
N
Naichuan Zheng
Beijing Laboratory of Advanced Information Networks, Beijing Key Laboratory of Network System Architecture and Convergence, School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing, 100876, China
H
Hailun Xia
Beijing Laboratory of Advanced Information Networks, Beijing Key Laboratory of Network System Architecture and Convergence, School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing, 100876, China
Z
Zeyu Liang
Beijing Laboratory of Advanced Information Networks, Beijing Key Laboratory of Network System Architecture and Convergence, School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing, 100876, China