🤖 AI Summary
This paper investigates whether deep neural networks (DNNs) implicitly adhere to a “computational Occam’s razor”—i.e., automatically favoring the simplest algorithm consistent with training data—to explain their superior generalization over classical statistical methods.
Method: Focusing on the “Harder than Monte Carlo” (HTMC) function approximation setting, the authors establish convexity of the HTMC-approximable function class and derive tight bounds linking the HTMC norm to the circuit complexity of ResNet architectures. They analyze weighted ℓ₁-norm regularized ResNets under this framework.
Contribution/Results: The work proves that minimizing the weighted ℓ₁ norm in ResNets is equivalent to near-optimal circuit-size compression under the HTMC mechanism. It introduces the first unified complexity measure integrating circuit complexity, functional convexity, and parameter norm regularization—providing a novel, computation-centric explanation for the empirical success of deep learning.
📝 Abstract
This paper argues that DNNs implement a computational Occam's razor -- finding the `simplest' algorithm that fits the data -- and that this could explain their incredible and wide-ranging success over more traditional statistical methods. We start with the discovery that the set of real-valued function $f$ that can be $ε$-approximated with a binary circuit of size at most $cε^{-γ}$ becomes convex in the `Harder than Monte Carlo' (HTMC) regime, when $γ>2$, allowing for the definition of a HTMC norm on functions. In parallel one can define a complexity measure on the parameters of a ResNets (a weighted $ell_1$ norm of the parameters), which induce a `ResNet norm' on functions. The HTMC and ResNet norms can then be related by an almost matching sandwich bound. Thus minimizing this ResNet norm is equivalent to finding a circuit that fits the data with an almost minimal number of nodes (within a power of 2 of being optimal). ResNets thus appear as an alternative model for computation of real functions, better adapted to the HTMC regime and its convexity.