High-dimensional learning of narrow neural networks

📅 2024-09-20

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

236K/year

🤖 AI Summary

This project investigates the effective learning mechanisms of narrow-width neural networks—such as MLPs, autoencoders, and attention modules—in high-dimensional big-data regimes. Addressing the theoretical gap in unifying generalization analysis across finite-width architectures under diverse learning paradigms—including supervised/unsupervised learning, denoising, and contrastive learning—we propose a **sequential multi-metric asymptotic model**, enabling the first systematic and unified characterization of narrow networks across heterogeneous tasks. Methodologically, we integrate statistical physics (replica method), approximate message passing (AMP), and high-dimensional asymptotic analysis to rigorously derive exact performance limits in the joint large-dimension–large-sample limit. Our framework yields the first analytically tractable and universally applicable theory for both generalization behavior and optimization dynamics of narrow-width networks, thereby filling a critical void in modern neural network theory concerning width-constrained settings.

Technology Category

Application Category

📝 Abstract

Recent years have been marked with the fast-pace diversification and increasing ubiquity of machine learning applications. Yet, a firm theoretical understanding of the surprising efficiency of neural networks to learn from high-dimensional data still proves largely elusive. In this endeavour, analyses inspired by statistical physics have proven instrumental, enabling the tight asymptotic characterization of the learning of neural networks in high dimensions, for a broad class of solvable models. This manuscript reviews the tools and ideas underlying recent progress in this line of work. We introduce a generic model -- the sequence multi-index model -- which encompasses numerous previously studied models as special instances. This unified framework covers a broad class of machine learning architectures with a finite number of hidden units, including multi-layer perceptrons, autoencoders, attention mechanisms; and tasks, including (un)supervised learning, denoising, contrastive learning, in the limit of large data dimension, and comparably large number of samples. We explicate in full detail the analysis of the learning of sequence multi-index models, using statistical physics techniques such as the replica method and approximate message-passing algorithms. This manuscript thus provides a unified presentation of analyses reported in several previous works, and a detailed overview of central techniques in the field of statistical physics of machine learning. This review should be a useful primer for machine learning theoreticians curious of statistical physics approaches; it should also be of value to statistical physicists interested in the transfer of such ideas to the study of neural networks.

Problem

Research questions and friction points this paper is trying to address.

Neural Networks

High-dimensional Data

Big Data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified Framework

Statistical Physics Methods

Neural Network Learning Characteristics

🔎 Similar Papers

Latent Point Collapse on a Low Dimensional Embedding in Deep Neural Network Classifiers