Teasing Apart Architecture and Initial Weights as Sources of Inductive Bias in Neural Networks

📅 2025-02-27

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper investigates the primary source of neural network inductive bias, specifically examining the relative roles of architectural design versus initial weight configuration. We employ a variant of Model-Agnostic Meta-Learning (MAML) to jointly optimize task-adapted initializations across diverse architectures—including MLPs, CNNs, LSTMs, and Transformers—and systematically evaluate their impact on learning dynamics and generalization across three distinct generalization settings. Key findings are: (1) Meta-learned initializations substantially reduce inter-architecture performance disparities—by an average of 76% across 430 experiments—with some cases eliminating differences entirely, challenging the conventional view that architecture dominates inductive bias; (2) generalization performance critically depends on the coverage of the meta-training task distribution, with all architectures exhibiting significant degradation on out-of-distribution tasks. These results establish initial weights as a tunable, high-impact source of inductive bias—complementing and, in many cases, superseding architectural constraints.

Technology Category

Application Category

📝 Abstract

Artificial neural networks can acquire many aspects of human knowledge from data, making them promising as models of human learning. But what those networks can learn depends upon their inductive biases -- the factors other than the data that influence the solutions they discover -- and the inductive biases of neural networks remain poorly understood, limiting our ability to draw conclusions about human learning from the performance of these systems. Cognitive scientists and machine learning researchers often focus on the architecture of a neural network as a source of inductive bias. In this paper we explore the impact of another source of inductive bias -- the initial weights of the network -- using meta-learning as a tool for finding initial weights that are adapted for specific problems. We evaluate four widely-used architectures -- MLPs, CNNs, LSTMs, and Transformers -- by meta-training 430 different models across three tasks requiring different biases and forms of generalization. We find that meta-learning can substantially reduce or entirely eliminate performance differences across architectures and data representations, suggesting that these factors may be less important as sources of inductive bias than is typically assumed. When differences are present, architectures and data representations that perform well without meta-learning tend to meta-train more effectively. Moreover, all architectures generalize poorly on problems that are far from their meta-training experience, underscoring the need for stronger inductive biases for robust generalization.

Problem

Research questions and friction points this paper is trying to address.

Exploring initial weights' role in neural networks' inductive bias

Assessing meta-learning's impact on reducing architecture performance differences

Highlighting need for stronger biases for robust generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Meta-learning optimizes initial network weights.

Four architectures evaluated across diverse tasks.

Meta-learning reduces performance differences across architectures.

🔎 Similar Papers

No similar papers found.

Authors to Follow