Transfer Learning in Infinite Width Feature Learning Networks

📅 2025-07-06

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This work investigates the transfer mechanism in infinitely wide neural networks when both the source and downstream tasks operate in the feature-learning regime. We propose Elastic Weight Coupling (EWC), a unified framework modeling feature reuse across pretraining and fine-tuning. Within a Bayesian setting, we integrate gradient flow analysis with weight decay and the infinite-width limit to derive an adaptive feature kernel theory—whose structure depends explicitly on the data distributions and label geometries of both tasks. Our methodology encompasses posterior inference, gradient-flow dynamical modeling, and explicit kernel construction, supporting both linear and polynomial regression as well as validation on real-world datasets. Theoretically and empirically, we characterize the joint influence of coupling strength, feature-learning capacity, dataset scale, and task alignment on transfer performance—demonstrating consistent improvements in generalization across diverse scenarios.

Technology Category

Application Category

📝 Abstract

We develop a theory of transfer learning in infinitely wide neural networks where both the pretraining (source) and downstream (target) task can operate in a feature learning regime. We analyze both the Bayesian framework, where learning is described by a posterior distribution over the weights, and gradient flow training of randomly initialized networks trained with weight decay. Both settings track how representations evolve in both source and target tasks. The summary statistics of these theories are adapted feature kernels which, after transfer learning, depend on data and labels from both source and target tasks. Reuse of features during transfer learning is controlled by an elastic weight coupling which controls the reliance of the network on features learned during training on the source task. We apply our theory to linear and polynomial regression tasks as well as real datasets. Our theory and experiments reveal interesting interplays between elastic weight coupling, feature learning strength, dataset size, and source and target task alignment on the utility of transfer learning.

Problem

Research questions and friction points this paper is trying to address.

Analyze transfer learning in infinite width neural networks

Study feature kernel adaptation from source to target tasks

Examine interplay of weight coupling and task alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Infinite width neural networks feature learning

Elastic weight coupling controls feature reuse

Adapted feature kernels from source and target

🔎 Similar Papers

Features are fate: a theory of transfer learning in high-dimensional regression