Infinite Width Limits of Self Supervised Neural Networks

๐Ÿ“… 2024-11-17
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper addresses the theoretical characterization of wide neural networks under self-supervised learning in the infinite-width limit, specifically asking whether networks trained with the Barlow Twins loss adhere to Neural Tangent Kernel (NTK) dynamics. Method: Leveraging NTK theory, functional analysis, and probabilistic limit tools, the authors develop a novel NTK convergence framework tailored to self-supervised architectures. Contribution/Results: They provide the first rigorous proof that, for two-layer networks optimized under the Barlow Twins objective, the NTK converges to a constant kernel as width tends to infinityโ€”refuting the common misconception that NTK evolution is loss-agnostic. Furthermore, they derive generalization error bounds for the limiting kernel model and establish quantitative links between its performance and that of finite-width networks. This work furnishes the first rigorous kernel-theoretic foundation for self-supervised wide networks, offering both theoretical novelty and interpretable guidance for algorithm design.

Technology Category

Application Category

๐Ÿ“ Abstract
The NTK is a widely used tool in the theoretical analysis of deep learning, allowing us to look at supervised deep neural networks through the lenses of kernel regression. Recently, several works have investigated kernel models for self-supervised learning, hypothesizing that these also shed light on the behavior of wide neural networks by virtue of the NTK. However, it remains an open question to what extent this connection is mathematically sound -- it is a commonly encountered misbelief that the kernel behavior of wide neural networks emerges irrespective of the loss function it is trained on. In this paper, we bridge the gap between the NTK and self-supervised learning, focusing on two-layer neural networks trained under the Barlow Twins loss. We prove that the NTK of Barlow Twins indeed becomes constant as the width of the network approaches infinity. Our analysis technique is a bit different from previous works on the NTK and may be of independent interest. Overall, our work provides a first justification for the use of classic kernel theory to understand self-supervised learning of wide neural networks. Building on this result, we derive generalization error bounds for kernelized Barlow Twins and connect them to neural networks of finite width.
Problem

Research questions and friction points this paper is trying to address.

NTK and self-supervised learning gap
Barlow Twins loss in wide networks
Generalization error bounds derivation
Innovation

Methods, ideas, or system contributions that make the work stand out.

NTK bridges self-supervised learning
Constant NTK in infinite width
Generalization bounds for Barlow Twins
๐Ÿ”Ž Similar Papers
No similar papers found.
Maximilian Fleissner
Maximilian Fleissner
PhD at Technical University of Munich
machine learningstatisticsexplainable machine learning
G
Gautham Govind Anil
IIT Madras
D
D. Ghoshdastidar
Technical University of Munich