đ¤ AI Summary
This paper investigates the primary source of neural network inductive bias, specifically examining the relative roles of architectural design versus initial weight configuration. We employ a variant of Model-Agnostic Meta-Learning (MAML) to jointly optimize task-adapted initializations across diverse architecturesâincluding MLPs, CNNs, LSTMs, and Transformersâand systematically evaluate their impact on learning dynamics and generalization across three distinct generalization settings. Key findings are: (1) Meta-learned initializations substantially reduce inter-architecture performance disparitiesâby an average of 76% across 430 experimentsâwith some cases eliminating differences entirely, challenging the conventional view that architecture dominates inductive bias; (2) generalization performance critically depends on the coverage of the meta-training task distribution, with all architectures exhibiting significant degradation on out-of-distribution tasks. These results establish initial weights as a tunable, high-impact source of inductive biasâcomplementing and, in many cases, superseding architectural constraints.
đ Abstract
Artificial neural networks can acquire many aspects of human knowledge from data, making them promising as models of human learning. But what those networks can learn depends upon their inductive biases -- the factors other than the data that influence the solutions they discover -- and the inductive biases of neural networks remain poorly understood, limiting our ability to draw conclusions about human learning from the performance of these systems. Cognitive scientists and machine learning researchers often focus on the architecture of a neural network as a source of inductive bias. In this paper we explore the impact of another source of inductive bias -- the initial weights of the network -- using meta-learning as a tool for finding initial weights that are adapted for specific problems. We evaluate four widely-used architectures -- MLPs, CNNs, LSTMs, and Transformers -- by meta-training 430 different models across three tasks requiring different biases and forms of generalization. We find that meta-learning can substantially reduce or entirely eliminate performance differences across architectures and data representations, suggesting that these factors may be less important as sources of inductive bias than is typically assumed. When differences are present, architectures and data representations that perform well without meta-learning tend to meta-train more effectively. Moreover, all architectures generalize poorly on problems that are far from their meta-training experience, underscoring the need for stronger inductive biases for robust generalization.