Adaptive kernel predictors from feature-learning infinite limits of neural networks

📅 2025-02-11

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

This work investigates the behavior of neural networks in the feature-learning infinite-width limit, addressing the fundamental question of whether such networks can be characterized by data-adaptive kernel machines. Building upon the Bayesian wide-network assumption and dynamical mean-field theory (DMFT), we rigorously establish, for the first time, the equivalence between neural networks in this limit and task-driven, data-dependent kernel predictors. We derive two explicit, computationally tractable adaptive kernel predictors—overcoming the limitations of conventional “lazy-training” kernels, which are static and data-agnostic. Our method integrates saddle-point variational inference with joint modeling of gradient flow and weight decay. Empirical evaluation on benchmark datasets demonstrates that the proposed predictors significantly reduce test loss, validating both the efficacy and superiority of task-oriented, data-adaptive kernel learning.

Technology Category

Application Category

📝 Abstract

Previous influential work showed that infinite width limits of neural networks in the lazy training regime are described by kernel machines. Here, we show that neural networks trained in the rich, feature learning infinite-width regime in two different settings are also described by kernel machines, but with data-dependent kernels. For both cases, we provide explicit expressions for the kernel predictors and prescriptions to numerically calculate them. To derive the first predictor, we study the large-width limit of feature-learning Bayesian networks, showing how feature learning leads to task-relevant adaptation of layer kernels and preactivation densities. The saddle point equations governing this limit result in a min-max optimization problem that defines the kernel predictor. To derive the second predictor, we study gradient flow training of randomly initialized networks trained with weight decay in the infinite-width limit using dynamical mean field theory (DMFT). The fixed point equations of the arising DMFT defines the task-adapted internal representations and the kernel predictor. We compare our kernel predictors to kernels derived from lazy regime and demonstrate that our adaptive kernels achieve lower test loss on benchmark datasets.

Problem

Research questions and friction points this paper is trying to address.

Adaptive kernel predictors from neural networks

Feature learning infinite-width regime analysis

Data-dependent kernels for lower test loss

Innovation

Methods, ideas, or system contributions that make the work stand out.

Data-dependent kernel machines

Min-max optimization problem

Dynamical mean field theory

🔎 Similar Papers

A Unified Kernel for Neural Network Learning