The Inductive Bias of Convolutional Neural Networks: Locality and Weight Sharing Reshape Implicit Regularization

📅 2026-03-05

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

This work establishes, for the first time, a theoretical foundation for the superior generalization of convolutional neural networks (CNNs) over fully connected networks on complex data distributions such as high-dimensional spheres. By leveraging local receptive fields and weight sharing, CNNs alter the implicit regularization induced by gradient descent, leading to significantly improved generalization. Through a synthesis of margin stability theory, high-dimensional probability analysis, and geometric modeling of convolutions, we uncover a coupling effect between the convolutional architecture and the patch manifold structure of images. When the receptive field size is much smaller than the ambient dimension, CNNs achieve a generalization rate of $n^{-1/6 + O(m/d)}$ on spherical data—substantially better than that of fully connected networks—and this theoretical advantage is corroborated on natural image datasets.

Technology Category

Application Category

📝 Abstract

We study how architectural inductive bias reshapes the implicit regularization induced by the edge-of-stability phenomenon in gradient descent. Prior work has established that for fully connected networks, the strength of this regularization is governed solely by the global input geometry; consequently, it is insufficient to prevent overfitting on difficult distributions such as the high-dimensional sphere. In this paper, we show that locality and weight sharing fundamentally change this picture. Specifically, we prove that provided the receptive field size $m$ remains small relative to the ambient dimension $d$, these networks generalize on spherical data with a rate of $n^{-\frac{1}{6} +O(m/d)}$, a regime where fully connected networks provably fail. This theoretical result confirms that weight sharing couples the learned filters to the low-dimensional patch manifold, thereby bypassing the high dimensionality of the ambient space. We further corroborate our theory by analyzing the patch geometry of natural images, showing that standard convolutional designs induce patch distributions that are highly amenable to this stability mechanism, thus providing a systematic explanation for the superior generalization of convolutional networks over fully connected baselines.

Problem

Research questions and friction points this paper is trying to address.

inductive bias

implicit regularization

convolutional neural networks

overfitting

generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

inductive bias

implicit regularization

convolutional neural networks