🤖 AI Summary
Non-contrastive self-supervised learning suffers from four pervasive failure modes: representation collapse, dimension collapse, cluster collapse, and intra-cluster collapse. To address these issues, this paper proposes FALCON—a theoretically grounded framework that jointly optimizes a carefully designed projector architecture and a customized loss function. Without explicit regularization or clustering constraints, FALCON implicitly introduces inductive biases to simultaneously mitigate all four collapse phenomena, yielding well-separated, semantically coherent representations. The method is rigorously validated on SVHN, CIFAR-10/100, and ImageNet-100, achieving state-of-the-art performance on downstream clustering and linear probe evaluation tasks—outperforming existing decorrelation- and clustering-based approaches. FALCON offers formal theoretical guarantees for collapse avoidance and demonstrates consistent generalization improvements across benchmarks, bridging the gap between principled design and empirical efficacy in non-contrastive representation learning.
📝 Abstract
We identify sufficient conditions to avoid known failure modes, including representation, dimensional, cluster and intracluster collapses, occurring in non-contrastive self-supervised learning. Based on these findings, we propose a principled design for the projector and loss function. We theoretically demonstrate that this design introduces an inductive bias that promotes learning representations that are both decorrelated and clustered without explicit enforcing these properties and leading to improved generalization. To the best of our knowledge, this is the first solution that achieves robust training with respect to these failure modes while guaranteeing enhanced generalization performance in downstream tasks. We validate our theoretical findings on image datasets including SVHN, CIFAR10, CIFAR100 and ImageNet-100, and show that our solution, dubbed FALCON, outperforms existing feature decorrelation and cluster-based self-supervised learning methods in terms of generalization to clustering and linear classification tasks.