URLOST: Unsupervised Representation Learning without Stationarity or Topology

📅 2023-10-06
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current unsupervised representation learning heavily relies on stationarity assumptions and topological priors, limiting its ability to model non-stationary, unstructured, high-dimensional signals prevalent in biological perception. To address this, we propose the first general-purpose unsupervised representation learning framework that requires neither stationarity nor topology priors. Our method integrates a learnable self-organizing layer, spectral clustering, and a masked autoencoder (MAE) to enable cross-modal adaptive representation learning. Evaluated on three highly heterogeneous, non-stationary data modalities—biologically inspired visual stimuli, neural electrophysiology recordings, and gene expression profiles—our approach significantly outperforms state-of-the-art methods including SimCLR and MAE. It establishes, for the first time, an unsupervised learning benchmark that combines theoretical universality with cross-modal robustness, offering a novel paradigm toward brain-like generalization capabilities.
📝 Abstract
Unsupervised representation learning has seen tremendous progress. However, it is constrained by its reliance on domain specific stationarity and topology, a limitation not found in biological intelligence systems. For instance, unlike computer vision, human vision can process visual signals sampled from highly irregular and non-stationary sensors. We introduce a novel framework that learns from high-dimensional data without prior knowledge of stationarity and topology. Our model, abbreviated as URLOST, combines a learnable self-organizing layer, spectral clustering, and a masked autoencoder (MAE). We evaluate its effectiveness on three diverse data modalities including simulated biological vision data, neural recordings from the primary visual cortex, and gene expressions. Compared to state-of-the-art unsupervised learning methods like SimCLR and MAE, our model excels at learning meaningful representations across diverse modalities without knowing their stationarity or topology. It also outperforms other methods that are not dependent on these factors, setting a new benchmark in the field. We position this work as a step toward unsupervised learning methods capable of generalizing across diverse high-dimensional data modalities.
Problem

Research questions and friction points this paper is trying to address.

Learning representations without stationarity or topology constraints
Processing high-dimensional data from irregular, non-stationary sources
Generalizing across diverse data modalities unsupervisedly
Innovation

Methods, ideas, or system contributions that make the work stand out.

Learns without stationarity or topology knowledge
Combines self-organizing layer, spectral clustering, MAE
Excels across diverse high-dimensional data modalities