SpikeCLR: Contrastive Self-Supervised Learning for Few-Shot Event-Based Vision using Spiking Neural Networks

📅 2026-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of limited large-scale annotated data in event-based vision, which hinders the application of spiking neural networks in few-shot scenarios. To overcome this, the authors propose SpikeCLR, a novel framework that introduces contrastive self-supervised learning to event-based spiking vision for the first time. The method incorporates a tailored spatio-temporal-polarity augmentation strategy designed specifically for event data, enabling robust representation learning from unlabeled data. These learned representations are subsequently fine-tuned to enhance performance in both few-shot and semi-supervised settings. Experimental results on benchmarks such as CIFAR10-DVS and N-Caltech101 demonstrate that SpikeCLR achieves superior performance compared to fully supervised models while using significantly fewer labeled samples, thereby validating the generalization capability and transfer effectiveness of the learned representations.

Technology Category

Application Category

📝 Abstract
Event-based vision sensors provide significant advantages for high-speed perception, including microsecond temporal resolution, high dynamic range, and low power consumption. When combined with Spiking Neural Networks (SNNs), they can be deployed on neuromorphic hardware, enabling energy-efficient applications on embedded systems. However, this potential is severely limited by the scarcity of large-scale labeled datasets required to effectively train such models. In this work, we introduce SpikeCLR, a contrastive self-supervised learning framework that enables SNNs to learn robust visual representations from unlabeled event data. We adapt prior frame-based methods to the spiking domain using surrogate gradient training and introduce a suite of event-specific augmentations that leverage spatial, temporal, and polarity transformations. Through extensive experiments on CIFAR10-DVS, N-Caltech101, N-MNIST, and DVS-Gesture benchmarks, we demonstrate that self-supervised pretraining with subsequent fine-tuning outperforms supervised learning in low-data regimes, achieving consistent gains in few-shot and semi-supervised settings. Our ablation studies reveal that combining spatial and temporal augmentations is critical for learning effective spatio-temporal invariances in event data. We further show that learned representations transfer across datasets, contributing to efforts for powerful event-based models in label-scarce settings.
Problem

Research questions and friction points this paper is trying to address.

event-based vision
spiking neural networks
few-shot learning
label scarcity
self-supervised learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spiking Neural Networks
Contrastive Learning
Event-based Vision
Self-supervised Learning
Data Augmentation
🔎 Similar Papers
No similar papers found.
M
Maxime Vaillant
CNRS, IPAL IRL 2955, Singapore; Université de Toulouse, IRIT, France; Institute for Infocomm Research, A*STAR, Singapore
Axel Carlier
Axel Carlier
ISAE-SUPAERO
AIMultimedia
L
Lai Xing Ng
CNRS, IPAL IRL 2955, Singapore; Institute for Infocomm Research, A*STAR, Singapore
Christophe Hurter
Christophe Hurter
Professor at ENAC, Fédération ENAC ISAE-SUPAERO ONERA, France. IPAL, Singapore
User-Centered AIVisual IntelligenceInformation VisualizationImmersive AnalyticsAir Traffic
B
Benoit R. Cottereau
CNRS, IPAL IRL 2955, Singapore; CerCo, CNRS UMR 5549, Université de Toulouse, France