Learning predictable and robust neural representations by straightening image sequences

📅 2024-11-04

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

To address weak predictive capability, poor disentanglement, and insufficient robustness in self-supervised representation learning, this paper proposes a novel self-supervised learning framework grounded in trajectory straightness—a geometric property of neural representation dynamics over time. We introduce the first explicit, differentiable formulation of trajectory straightness as an optimizable objective, leveraging synthetically generated smooth video sequences to enforce temporal consistency. The method jointly optimizes for prediction accuracy, and geometric, photometric, and semantic disentanglement, while enhancing robustness via a straightness-aware loss and plug-and-play regularization—requiring no additional annotations. Experiments demonstrate that the learned representations enable linear extrapolation for future frame prediction, significantly improve robustness against noise and adversarial perturbations, and serve as a modular enhancement that boosts the performance of mainstream self-supervised learning methods without architectural modification.

Technology Category

Application Category

📝 Abstract

Prediction is a fundamental capability of all living organisms, and has been proposed as an objective for learning sensory representations. Recent work demonstrates that in primate visual systems, prediction is facilitated by neural representations that follow straighter temporal trajectories than their initial photoreceptor encoding, which allows for prediction by linear extrapolation. Inspired by these experimental findings, we develop a self-supervised learning (SSL) objective that explicitly quantifies and promotes straightening. We demonstrate the power of this objective in training deep feedforward neural networks on smoothly-rendered synthetic image sequences that mimic commonly-occurring properties of natural videos. The learned model contains neural embeddings that are predictive, but also factorize the geometric, photometric, and semantic attributes of objects. The representations also prove more robust to noise and adversarial attacks compared to previous SSL methods that optimize for invariance to random augmentations. Moreover, these beneficial properties can be transferred to other training procedures by using the straightening objective as a regularizer, suggesting a broader utility for straightening as a principle for robust unsupervised learning.

Problem

Research questions and friction points this paper is trying to address.

Self-supervised Learning

Deep Neural Networks

Robustness in Image Processing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised Learning

Predictive Mechanism

Robustness to Adversarial Attacks

🔎 Similar Papers

Probing Human Visual Robustness with Neurally-Guided Deep Neural Networks