Breaking the Modality Wall: Time-step Mixup for Efficient Spiking Knowledge Transfer from Static to Event Domain

📅 2025-11-15

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

Addressing the significant modality gap between static RGB images and sparse event-stream data from dynamic vision sensors (DVS), as well as the resulting low-efficiency knowledge transfer, this paper proposes a cross-modal knowledge transfer framework, TMKT. Methodologically, TMKT introduces three key components: (1) probabilistic timestep Mixup to construct a smooth learning curriculum within event sequences; (2) modality-aware guidance (MAG) and mixture-ratio perception (MRP) mechanisms to explicitly align cross-modal temporal features and reduce gradient variance; and (3) cross-modal interpolation and supervision leveraging the inherent asynchrony of spiking neural networks (SNNs). Evaluated on multiple benchmark datasets and mainstream SNN backbones, TMKT consistently achieves substantial improvements in classification accuracy, demonstrating its effectiveness, robustness, and generalization capability across diverse modalities and architectures.

Technology Category

Application Category

📝 Abstract

The integration of event cameras and spiking neural networks (SNNs) promises energy-efficient visual intelligence, yet scarce event data and the sparsity of DVS outputs hinder effective training. Prior knowledge transfers from RGB to DVS often underperform because the distribution gap between modalities is substantial. In this work, we present Time-step Mixup Knowledge Transfer (TMKT), a cross-modal training framework with a probabilistic Time-step Mixup (TSM) strategy. TSM exploits the asynchronous nature of SNNs by interpolating RGB and DVS inputs at various time steps to produce a smooth curriculum within each sequence, which reduces gradient variance and stabilizes optimization with theoretical analysis. To employ auxiliary supervision from TSM, TMKT introduces two lightweight modality-aware objectives, Modality Aware Guidance (MAG) for per-frame source supervision and Mixup Ratio Perception (MRP) for sequence-level mix ratio estimation, which explicitly align temporal features with the mixing schedule. TMKT enables smoother knowledge transfer, helps mitigate modality mismatch during training, and achieves superior performance in spiking image classification tasks. Extensive experiments across diverse benchmarks and multiple SNN backbones, together with ablations, demonstrate the effectiveness of our method.

Problem

Research questions and friction points this paper is trying to address.

Bridges modality gap between static RGB and event DVS data

Addresses sparse event data and inefficient SNN training

Enables stable knowledge transfer through temporal interpolation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Time-step Mixup interpolates RGB and DVS inputs

Modality Aware Guidance provides per-frame supervision

Mixup Ratio Perception aligns temporal mixing features

🔎 Similar Papers

No similar papers found.