SpikeCLR: Contrastive Self-Supervised Learning for Few-Shot Event-Based Vision using Spiking Neural Networks

📅 2026-03-17

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

This work addresses the challenge of limited large-scale annotated data in event-based vision, which hinders the application of spiking neural networks in few-shot scenarios. To overcome this, the authors propose SpikeCLR, a novel framework that introduces contrastive self-supervised learning to event-based spiking vision for the first time. The method incorporates a tailored spatio-temporal-polarity augmentation strategy designed specifically for event data, enabling robust representation learning from unlabeled data. These learned representations are subsequently fine-tuned to enhance performance in both few-shot and semi-supervised settings. Experimental results on benchmarks such as CIFAR10-DVS and N-Caltech101 demonstrate that SpikeCLR achieves superior performance compared to fully supervised models while using significantly fewer labeled samples, thereby validating the generalization capability and transfer effectiveness of the learned representations.

Technology Category

Application Category

📝 Abstract

Event-based vision sensors provide significant advantages for high-speed perception, including microsecond temporal resolution, high dynamic range, and low power consumption. When combined with Spiking Neural Networks (SNNs), they can be deployed on neuromorphic hardware, enabling energy-efficient applications on embedded systems. However, this potential is severely limited by the scarcity of large-scale labeled datasets required to effectively train such models. In this work, we introduce SpikeCLR, a contrastive self-supervised learning framework that enables SNNs to learn robust visual representations from unlabeled event data. We adapt prior frame-based methods to the spiking domain using surrogate gradient training and introduce a suite of event-specific augmentations that leverage spatial, temporal, and polarity transformations. Through extensive experiments on CIFAR10-DVS, N-Caltech101, N-MNIST, and DVS-Gesture benchmarks, we demonstrate that self-supervised pretraining with subsequent fine-tuning outperforms supervised learning in low-data regimes, achieving consistent gains in few-shot and semi-supervised settings. Our ablation studies reveal that combining spatial and temporal augmentations is critical for learning effective spatio-temporal invariances in event data. We further show that learned representations transfer across datasets, contributing to efforts for powerful event-based models in label-scarce settings.

Problem

Research questions and friction points this paper is trying to address.

event-based vision

spiking neural networks

few-shot learning

label scarcity

self-supervised learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Spiking Neural Networks

Contrastive Learning

Event-based Vision