MK-SGN: A Spiking Graph Convolutional Network with Multimodal Fusion and Knowledge Distillation for Skeleton-based Action Recognition

📅 2024-04-16

🏛️ Neurocomputing

📈 Citations: 2

✨ Influential: 0

career value

172K/year

🤖 AI Summary

To address the challenge of skeleton-based action recognition under stringent power constraints on edge devices, this paper proposes the Spiking Graph Convolutional Network (S-GCN), the first framework to deeply integrate Spiking Neural Networks (SNNs) with Graph Convolutional Networks (GCNs) for multimodal skeleton data fusion. Key contributions include: (1) a Spiking Multimodal Fusion (SMF) module; (2) a joint modeling architecture combining Self-Attention Spiking Graph Convolution (SA-SGC) and Spiking Temporal Convolution (STC); and (3) a collaborative knowledge distillation strategy leveraging intermediate-layer features and soft labels. Experiments demonstrate that S-GCN reduces energy consumption by over 98% compared to floating-point GCN baselines, while achieving significantly higher accuracy than state-of-the-art SNN methods and mainstream GCN frameworks. This work establishes a new paradigm for high-accuracy, ultra-low-power skeleton action recognition on resource-constrained edge devices.

Technology Category

Application Category

📝 Abstract

In recent years, multimodal Graph Convolutional Networks (GCNs) have achieved remarkable performance in skeleton-based action recognition. The reliance on high-energy-consuming continuous floating-point operations inherent in GCN-based methods poses significant challenges for deployment in energy-constrained, battery-powered edge devices. To address these limitations, MK-SGN, a Spiking Graph Convolutional Network with Multimodal Fusion and Knowledge Distillation, is proposed to leverage the energy efficiency of Spiking Neural Networks (SNNs) for skeleton-based action recognition for the first time. By integrating the energy-saving properties of SNNs with the graph representation capabilities of GCNs, MK-SGN achieves significant reductions in energy consumption while maintaining competitive recognition accuracy. Firstly, we formulate a Spiking Multimodal Fusion (SMF) module to effectively fuse multimodal skeleton data represented as spike-form features. Secondly, we propose the Self-Attention Spiking Graph Convolution (SA-SGC) module and the Spiking Temporal Convolution (STC) module, to capture spatial relationships and temporal dynamics of spike-form features. Finally, we propose an integrated knowledge distillation strategy to transfer information from the multimodal GCN to the SGN, incorporating both intermediate-layer distillation and soft-label distillation to enhance the performance of the SGN. MK-SGN exhibits substantial advantages, surpassing state-of-the-art GCN frameworks in energy efficiency and outperforming state-of-the-art SNN frameworks in recognition accuracy. The proposed method achieves a remarkable reduction in energy consumption, exceeding 98% compared to conventional GCN-based approaches. This research establishes a robust baseline for developing high-performance, energy-efficient SNN-based models for skeleton-based action recognition

Problem

Research questions and friction points this paper is trying to address.

Reducing energy consumption in skeleton-based action recognition models

Maintaining recognition accuracy while using energy-efficient spiking neural networks

Fusing multimodal skeleton data and transferring knowledge from GCNs to SNNs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Spiking Neural Networks for energy-efficient action recognition

Integrates multimodal fusion and knowledge distillation techniques

Proposes novel spiking modules for spatial-temporal feature extraction

🔎 Similar Papers

Signal-SGN: A Spiking Graph Convolutional Network for Skeletal Action Recognition via Learning Temporal-Frequency Dynamics