🤖 AI Summary
To address the high training cost of Spiking Neural Networks (SNNs) in event-based vision—stemming from temporal encoding—we propose PACE, the first knowledge distillation framework tailored for event data. Our method introduces three key innovations: (1) the ST-DSM module, enabling fine-grained spatiotemporal feature densification with phase alignment; (2) PEQ-N, a plug-and-play probabilistic integer quantizer supporting event-frame-compatible low-bit weight representation; and (3) residual membrane potential–driven SDR enhancement combined with synthetic sample optimization. On N-MNIST, PACE achieves 84.4% accuracy using only 15% of the full dataset—reaching 85% of the full-data performance—while accelerating training by 50× and reducing model storage by 6000×. This marks the first demonstration of minute-scale, highly efficient SNN training for event data.
📝 Abstract
Event cameras sense brightness changes and output binary asynchronous event streams, attracting increasing attention. Their bio-inspired dynamics align well with spiking neural networks (SNNs), offering a promising energy-efficient alternative to conventional vision systems. However, SNNs remain costly to train due to temporal coding, which limits their practical deployment. To alleviate the high training cost of SNNs, we introduce extbf{PACE} (Phase-Aligned Condensation for Events), the first dataset distillation framework to SNNs and event-based vision. PACE distills a large training dataset into a compact synthetic one that enables fast SNN training, which is achieved by two core modules: extbf{ST-DSM} and extbf{PEQ-N}. ST-DSM uses residual membrane potentials to densify spike-based features (SDR) and to perform fine-grained spatiotemporal matching of amplitude and phase (ST-SM), while PEQ-N provides a plug-and-play straight through probabilistic integer quantizer compatible with standard event-frame pipelines. Across DVS-Gesture, CIFAR10-DVS, and N-MNIST datasets, PACE outperforms existing coreset selection and dataset distillation baselines, with particularly strong gains on dynamic event streams and at low or moderate IPC. Specifically, on N-MNIST, it achieves (84.4%) accuracy, about (85%) of the full training set performance, while reducing training time by more than (50 imes) and storage cost by (6000 imes), yielding compact surrogates that enable minute-scale SNN training and efficient edge deployment.