AdaDim: Dimensionality Adaptation for SSL Representational Dynamics

📅 2025-05-18

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work addresses the dimensionality collapse problem in self-supervised learning (SSL) by investigating the dynamic coupling between representation entropy $H(R)$ and representation-embedding mutual information $I(R;Z)$. We discover a non-monotonic relationship: optimal downstream performance occurs not at extremes but at an intermediate balance point. Crucially, we reveal for the first time that increasing $H(R)$ early in training boosts $I(R;Z)$, whereas doing so late impairs it. Based on this insight, we propose AdaDim—a method that employs information-theoretic, dynamically weighted losses to adaptively balance feature decorrelation and sample uniformity. AdaDim further integrates projection-head optimization and geometry-aware training scheduling. On ImageNet-1K linear evaluation, it significantly outperforms baselines including SimCLR and BYOL. Empirical results robustly confirm that the equilibrium state of the $H(R)/I(R;Z)$ ratio strongly correlates with downstream task performance.

Technology Category

Application Category

📝 Abstract

A key factor in effective Self-Supervised learning (SSL) is preventing dimensional collapse, which is where higher-dimensional representation spaces span a lower-dimensional subspace. Therefore, SSL optimization strategies involve guiding a model to produce representations ($R$) with a higher dimensionality. Dimensionality is either optimized through a dimension-contrastive approach that encourages feature decorrelation or through a sample-contrastive method that promotes a uniform spread of sample representations. Both families of SSL algorithms also utilize a projection head that maps $R$ into a lower-dimensional embedding space $Z$. Recent work has characterized the projection head as a filter of irrelevant features from the SSL objective by reducing mutual information, $I(R;Z)$. Therefore, the current literature's view is that a good SSL representation space should have a high $H(R)$ and a low $I(R;Z)$. However, this view of the problem is lacking in terms of an understanding of the underlying training dynamics that influences both terms, as well as how the values of $H(R)$ and $I(R;Z)$ arrived at the end of training reflect the downstream performance of an SSL model. We address both gaps in the literature by demonstrating that increases in $H(R)$ due to feature decorrelation at the start of training lead to a higher $I(R;Z)$, while increases in $H(R)$ due to samples distributing uniformly in a high-dimensional space at the end of training cause $I(R;Z)$ to plateau or decrease. Furthermore, our analysis shows that the best performing SSL models do not have the highest $H(R)$ nor the lowest $I(R;Z)$, but arrive at an optimal intermediate point for both. We develop a method called AdaDim to exploit these observed training dynamics by adaptively weighting between losses based on feature decorrelation and uniform sample spread.

Problem

Research questions and friction points this paper is trying to address.

Preventing dimensional collapse in SSL representation spaces

Understanding dynamics between H(R) and I(R;Z) in SSL training

Optimizing SSL performance via adaptive loss weighting (AdaDim)

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive weighting between feature decorrelation and uniform spread losses

Dynamic adjustment of representation dimensionality during training

Optimal balance between high H(R) and low I(R;Z)

🔎 Similar Papers

No similar papers found.