Breaking the Modality Wall: Time-step Mixup for Efficient Spiking Knowledge Transfer from Static to Event Domain

📅 2025-11-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the significant modality gap between static RGB images and sparse event-stream data from dynamic vision sensors (DVS), as well as the resulting low-efficiency knowledge transfer, this paper proposes a cross-modal knowledge transfer framework, TMKT. Methodologically, TMKT introduces three key components: (1) probabilistic timestep Mixup to construct a smooth learning curriculum within event sequences; (2) modality-aware guidance (MAG) and mixture-ratio perception (MRP) mechanisms to explicitly align cross-modal temporal features and reduce gradient variance; and (3) cross-modal interpolation and supervision leveraging the inherent asynchrony of spiking neural networks (SNNs). Evaluated on multiple benchmark datasets and mainstream SNN backbones, TMKT consistently achieves substantial improvements in classification accuracy, demonstrating its effectiveness, robustness, and generalization capability across diverse modalities and architectures.

Technology Category

Application Category

📝 Abstract
The integration of event cameras and spiking neural networks (SNNs) promises energy-efficient visual intelligence, yet scarce event data and the sparsity of DVS outputs hinder effective training. Prior knowledge transfers from RGB to DVS often underperform because the distribution gap between modalities is substantial. In this work, we present Time-step Mixup Knowledge Transfer (TMKT), a cross-modal training framework with a probabilistic Time-step Mixup (TSM) strategy. TSM exploits the asynchronous nature of SNNs by interpolating RGB and DVS inputs at various time steps to produce a smooth curriculum within each sequence, which reduces gradient variance and stabilizes optimization with theoretical analysis. To employ auxiliary supervision from TSM, TMKT introduces two lightweight modality-aware objectives, Modality Aware Guidance (MAG) for per-frame source supervision and Mixup Ratio Perception (MRP) for sequence-level mix ratio estimation, which explicitly align temporal features with the mixing schedule. TMKT enables smoother knowledge transfer, helps mitigate modality mismatch during training, and achieves superior performance in spiking image classification tasks. Extensive experiments across diverse benchmarks and multiple SNN backbones, together with ablations, demonstrate the effectiveness of our method.
Problem

Research questions and friction points this paper is trying to address.

Bridges modality gap between static RGB and event DVS data
Addresses sparse event data and inefficient SNN training
Enables stable knowledge transfer through temporal interpolation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Time-step Mixup interpolates RGB and DVS inputs
Modality Aware Guidance provides per-frame supervision
Mixup Ratio Perception aligns temporal mixing features
🔎 Similar Papers
No similar papers found.
Y
Yuqi Xie
Ningbo University
S
Shuhan Ye
Ningbo University, Nanyang Technological University
Y
Yi Yu
Nanyang Technological University
C
Chong Wang
Ningbo University, Merchants’ Guild Economics and Cultural
Q
Qixin Zhang
Nanyang Technological University
Jiazhen Xu
Jiazhen Xu
The Australian National University
L
Le Shen
Ningbo University
Y
Yuanbin Qian
Ningbo University
J
Jiangbo Qian
Ningbo University, Merchants’ Guild Economics and Cultural
Guoqi Li
Guoqi Li
Professor, Institue of Automation,Chinese Academy of Sciences,Previously Tsinghua University
Brain inspired computingSpiking neural networksBrain inspired large modelsNeuroAI