Deep Learning as a Convex Paradigm of Computation: Minimizing Circuit Size with ResNets

📅 2025-11-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates whether deep neural networks (DNNs) implicitly adhere to a “computational Occam’s razor”—i.e., automatically favoring the simplest algorithm consistent with training data—to explain their superior generalization over classical statistical methods. Method: Focusing on the “Harder than Monte Carlo” (HTMC) function approximation setting, the authors establish convexity of the HTMC-approximable function class and derive tight bounds linking the HTMC norm to the circuit complexity of ResNet architectures. They analyze weighted ℓ₁-norm regularized ResNets under this framework. Contribution/Results: The work proves that minimizing the weighted ℓ₁ norm in ResNets is equivalent to near-optimal circuit-size compression under the HTMC mechanism. It introduces the first unified complexity measure integrating circuit complexity, functional convexity, and parameter norm regularization—providing a novel, computation-centric explanation for the empirical success of deep learning.

Technology Category

Application Category

📝 Abstract
This paper argues that DNNs implement a computational Occam's razor -- finding the `simplest' algorithm that fits the data -- and that this could explain their incredible and wide-ranging success over more traditional statistical methods. We start with the discovery that the set of real-valued function $f$ that can be $ε$-approximated with a binary circuit of size at most $cε^{-γ}$ becomes convex in the `Harder than Monte Carlo' (HTMC) regime, when $γ>2$, allowing for the definition of a HTMC norm on functions. In parallel one can define a complexity measure on the parameters of a ResNets (a weighted $ell_1$ norm of the parameters), which induce a `ResNet norm' on functions. The HTMC and ResNet norms can then be related by an almost matching sandwich bound. Thus minimizing this ResNet norm is equivalent to finding a circuit that fits the data with an almost minimal number of nodes (within a power of 2 of being optimal). ResNets thus appear as an alternative model for computation of real functions, better adapted to the HTMC regime and its convexity.
Problem

Research questions and friction points this paper is trying to address.

Deep learning finds simplest algorithms fitting data via Occam's razor principle
Relates ResNet parameter norms to circuit complexity in convex computation regime
Minimizes neural network size while maintaining optimal function approximation capacity
Innovation

Methods, ideas, or system contributions that make the work stand out.

ResNets minimize weighted L1 norm of parameters
ResNet norm approximates HTMC circuit complexity
Convex optimization finds near-optimal circuit size
🔎 Similar Papers
No similar papers found.