The Computational Advantage of Depth: Learning High-Dimensional Hierarchical Functions with Gradient Descent

📅 2025-02-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The sample efficiency advantage of deep neural networks over shallow models in learning high-dimensional hierarchical functions lacks rigorous theoretical justification. Method: We construct a multi-index Gaussian hierarchical target function and, in the high-dimensional limit, employ asymptotic analysis, gradient descent dynamics modeling, and feature learning theory to rigorously characterize how deep networks implicitly decompose features and reduce effective dimensionality. Contribution: We provide the first theoretical proof that deep networks decouple high-dimensional learning into lower-dimensional subproblems, achieving sample complexity exponentially superior to shallow models. Crucially, we establish that depth’s fundamental role lies not merely in increasing parameter capacity, but in adaptively capturing the intrinsic hierarchical structure of the target function—enabling implicit feature decomposition and structural alignment between network architecture and function geometry. This resolves a long-standing open question regarding the statistical benefits of depth in nonparametric regression settings.

Technology Category

Application Category

📝 Abstract
Understanding the advantages of deep neural networks trained by gradient descent (GD) compared to shallow models remains an open theoretical challenge. While the study of multi-index models with Gaussian data in high dimensions has provided analytical insights into the benefits of GD-trained neural networks over kernels, the role of depth in improving sample complexity and generalization in GD-trained networks remains poorly understood. In this paper, we introduce a class of target functions (single and multi-index Gaussian hierarchical targets) that incorporate a hierarchy of latent subspace dimensionalities. This framework enables us to analytically study the learning dynamics and generalization performance of deep networks compared to shallow ones in the high-dimensional limit. Specifically, our main theorem shows that feature learning with GD reduces the effective dimensionality, transforming a high-dimensional problem into a sequence of lower-dimensional ones. This enables learning the target function with drastically less samples than with shallow networks. While the results are proven in a controlled training setting, we also discuss more common training procedures and argue that they learn through the same mechanisms. These findings open the way to further quantitative studies of the crucial role of depth in learning hierarchical structures with deep networks.
Problem

Research questions and friction points this paper is trying to address.

Investigating depth advantages in gradient descent-trained neural networks for high-dimensional hierarchical functions
Analyzing how depth reduces sample complexity via effective dimensionality reduction in learning
Establishing theoretical framework for hierarchical target learning dynamics comparing deep versus shallow networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introducing Gaussian hierarchical targets with latent subspace dimensionality hierarchies
Feature learning reduces effective dimensionality via gradient descent optimization
Deep networks require fewer samples than shallow ones for learning
🔎 Similar Papers
No similar papers found.