Do deep neural networks have an inbuilt Occam's razor?

📅 2023-04-13
🏛️ arXiv.org
📈 Citations: 16
Influential: 1
📄 PDF
🤖 AI Summary
This paper addresses the fundamental question of why overparameterized deep neural networks (DNNs) generalize well despite their high capacity. Method: We propose that DNNs inherently implement a Bayesian Occam’s razor—i.e., an implicit prior favoring Kolmogorov-simple functions—which curbs explosive growth in function-space complexity. We develop the first function-space-based Bayesian modeling framework, decoupling the roles of network architecture, training dynamics, and data structure. Leveraging Boolean function error-spectrum approximation, order–chaos phase-transition control of the prior, and empirical SGD posterior analysis, we characterize DNNs’ intrinsic simplicity bias directly in function space. Contribution/Results: Our framework theoretically and empirically reveals that DNNs possess a built-in inductive bias toward simplicity; it accurately predicts the SGD-induced posterior distribution over functions, demonstrating that the synergy between structured data and this inherent bias is essential to DNN generalization.
📝 Abstract
The remarkable performance of overparameterized deep neural networks (DNNs) must arise from an interplay between network architecture, training algorithms, and structure in the data. To disentangle these three components, we apply a Bayesian picture, based on the functions expressed by a DNN, to supervised learning. The prior over functions is determined by the network, and is varied by exploiting a transition between ordered and chaotic regimes. For Boolean function classification, we approximate the likelihood using the error spectrum of functions on data. When combined with the prior, this accurately predicts the posterior, measured for DNNs trained with stochastic gradient descent. This analysis reveals that structured data, combined with an intrinsic Occam's razor-like inductive bias towards (Kolmogorov) simple functions that is strong enough to counteract the exponential growth of the number of functions with complexity, is a key to the success of DNNs.
Problem

Research questions and friction points this paper is trying to address.

Understand interplay between DNN architecture, training, and data structure
Analyze Bayesian prior over functions in ordered vs chaotic regimes
Reveal DNNs' intrinsic bias towards simple functions for success
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian approach for DNN function analysis
Transition between ordered and chaotic regimes
Error spectrum approximates likelihood for classification
🔎 Similar Papers
No similar papers found.
Chris Mingard
Chris Mingard
DPhil student, University of Oxford
Deep LearningAIDiscrete mathematics
H
Henry Rees
Rudolf Peierls Centre for Theoretical Physics, University of Oxford; Oxford OX1 3PU, UK
G
Guillermo Valle Pérez
Rudolf Peierls Centre for Theoretical Physics, University of Oxford; Oxford OX1 3PU, UK
A
A. Louis
Rudolf Peierls Centre for Theoretical Physics, University of Oxford; Oxford OX1 3PU, UK