Head-Tail-Aware KL Divergence in Knowledge Distillation for Spiking Neural Networks

📅 2025-04-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited generalization of Spiking Neural Networks (SNNs) in knowledge distillation—caused by conventional KL divergence’s overemphasis on high-probability predictions while neglecting low-probability regions—this paper proposes Head-Tail Aware KL divergence (HTA-KL) distillation. The core innovation is a novel cumulative-probability-based dynamic masking mechanism that adaptively partitions the output distribution into “head” (high-probability) and “tail” (low-probability) regions, jointly optimizing both forward and reverse KL divergences to achieve global alignment of SNN output distributions. This enables a cross-paradigm distillation framework bridging SNNs and Artificial Neural Networks (ANNs). Evaluated on CIFAR-10, CIFAR-100, and Tiny ImageNet, HTA-KL consistently outperforms state-of-the-art methods, achieving higher accuracy with fewer simulation time steps—thereby simultaneously improving energy efficiency and generalization capability.

Technology Category

Application Category

📝 Abstract
Spiking Neural Networks (SNNs) have emerged as a promising approach for energy-efficient and biologically plausible computation. However, due to limitations in existing training methods and inherent model constraints, SNNs often exhibit a performance gap when compared to Artificial Neural Networks (ANNs). Knowledge distillation (KD) has been explored as a technique to transfer knowledge from ANN teacher models to SNN student models to mitigate this gap. Traditional KD methods typically use Kullback-Leibler (KL) divergence to align output distributions. However, conventional KL-based approaches fail to fully exploit the unique characteristics of SNNs, as they tend to overemphasize high-probability predictions while neglecting low-probability ones, leading to suboptimal generalization. To address this, we propose Head-Tail Aware Kullback-Leibler (HTA-KL) divergence, a novel KD method for SNNs. HTA-KL introduces a cumulative probability-based mask to dynamically distinguish between high- and low-probability regions. It assigns adaptive weights to ensure balanced knowledge transfer, enhancing the overall performance. By integrating forward KL (FKL) and reverse KL (RKL) divergence, our method effectively align both head and tail regions of the distribution. We evaluate our methods on CIFAR-10, CIFAR-100 and Tiny ImageNet datasets. Our method outperforms existing methods on most datasets with fewer timesteps.
Problem

Research questions and friction points this paper is trying to address.

Bridging performance gap between SNNs and ANNs via knowledge distillation
Improving KL divergence for balanced head-tail probability learning
Enhancing SNN generalization with adaptive knowledge transfer weights
Innovation

Methods, ideas, or system contributions that make the work stand out.

HTA-KL divergence for balanced knowledge transfer
Cumulative probability-based mask for dynamic weighting
Combines forward and reverse KL divergence effectively
🔎 Similar Papers
No similar papers found.
T
Tianqing Zhang
College of Computer Science and Technology, Zhejiang University, Hangzhou, China
Z
Zixin Zhu
ZJU-UIUC Institute, Zhejiang University, Haining, China
Kairong Yu
Kairong Yu
Zhejiang University
Computer VisionMultimodal LearningSpiking Neural Network
H
Hongwei Wang
College of Computer Science and Technology, Zhejiang University, Hangzhou, China; ZJU-UIUC Institute, Zhejiang University, Haining, China