PredNext: Explicit Cross-View Temporal Prediction for Unsupervised Learning in Spiking Neural Networks

๐Ÿ“… 2025-09-29
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing unsupervised spiking neural networks (SNNs) suffer from shallow architectures and local plasticity rules, limiting their capacity to capture long-range temporal dependencies and resulting in semantically unstable video representations. To address this, we propose a deep unsupervised SNN representation learning framework centered on the PredNext module: a clip-level cross-view future prediction mechanism that explicitly models multi-step temporal consistency without requiring explicit consistency constraints, thereby enhancing feature stability and generalization. The framework is modular and compatible with diverse self-supervised objectives. We establish the first standardized SNN self-supervised benchmark on UCF101, HMDB51, and MiniKinetics. When trained solely on UCF101 in an unsupervised manner, our model achieves performance comparable to ImageNet-supervised pretrained modelsโ€”marking a significant advance toward large-scale, temporally grounded unsupervised learning with deep SNNs.

Technology Category

Application Category

๐Ÿ“ Abstract
Spiking Neural Networks (SNNs), with their temporal processing capabilities and biologically plausible dynamics, offer a natural platform for unsupervised representation learning. However, current unsupervised SNNs predominantly employ shallow architectures or localized plasticity rules, limiting their ability to model long-range temporal dependencies and maintain temporal feature consistency. This results in semantically unstable representations, thereby impeding the development of deep unsupervised SNNs for large-scale temporal video data. We propose PredNext, which explicitly models temporal relationships through cross-view future Step Prediction and Clip Prediction. This plug-and-play module seamlessly integrates with diverse self-supervised objectives. We firstly establish standard benchmarks for SNN self-supervised learning on UCF101, HMDB51, and MiniKinetics, which are substantially larger than conventional DVS datasets. PredNext delivers significant performance improvements across different tasks and self-supervised methods. PredNext achieves performance comparable to ImageNet-pretrained supervised weights through unsupervised training solely on UCF101. Additional experiments demonstrate that PredNext, distinct from forced consistency constraints, substantially improves temporal feature consistency while enhancing network generalization capabilities. This work provides a effective foundation for unsupervised deep SNNs on large-scale temporal video data.
Problem

Research questions and friction points this paper is trying to address.

Addresses unstable representations in deep unsupervised spiking neural networks
Models long-range temporal dependencies through cross-view prediction mechanisms
Enables unsupervised learning on large-scale temporal video datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Explicit cross-view temporal prediction for unsupervised learning
Plug-and-play module integrating with self-supervised objectives
Improves temporal feature consistency and network generalization capabilities
๐Ÿ”Ž Similar Papers
No similar papers found.