Self-Supervised Learning from Structural Invariance

📅 2026-02-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge in self-supervised learning where a single input often corresponds to multiple semantically consistent yet structurally diverse targets—such as consecutive video frames—leading to inadequate modeling of conditional uncertainty. To this end, the authors propose AdaSSL, a general-purpose regularization mechanism that, for the first time, integrates latent variable modeling with a variational lower bound on mutual information. By maximizing the mutual information between embedding pairs, AdaSSL effectively captures the multimodal nature of the output distribution while preserving structural invariance under conditional uncertainty. The method seamlessly integrates into both contrastive learning and knowledge distillation frameworks. Extensive experiments across causal representation learning, fine-grained image understanding, and video world modeling demonstrate significant performance gains, underscoring AdaSSL’s generality and effectiveness.

Technology Category

Application Category

📝 Abstract
Joint-embedding self-supervised learning (SSL), the key paradigm for unsupervised representation learning from visual data, learns from invariances between semantically-related data pairs. We study the one-to-many mapping problem in SSL, where each datum may be mapped to multiple valid targets. This arises when data pairs come from naturally occurring generative processes, e.g., successive video frames. We show that existing methods struggle to flexibly capture this conditional uncertainty. As a remedy, we introduce a latent variable to account for this uncertainty and derive a variational lower bound on the mutual information between paired embeddings. Our derivation yields a simple regularization term for standard SSL objectives. The resulting method, which we call AdaSSL, applies to both contrastive and distillation-based SSL objectives, and we empirically show its versatility in causal representation learning, fine-grained image understanding, and world modeling on videos.
Problem

Research questions and friction points this paper is trying to address.

self-supervised learning
one-to-many mapping
conditional uncertainty
structural invariance
representation learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

self-supervised learning
one-to-many mapping
latent variable
mutual information
variational lower bound
🔎 Similar Papers
No similar papers found.