🤖 AI Summary
This work proposes a pointwise generalization theory for deep neural networks, addressing the challenge that existing frameworks struggle to characterize the role of nonlinear feature learning in generalization. By introducing the Riemannian dimension—a novel measure derived from the covariance spectrum of feature representations at each layer—the study establishes a tight, hypothesis- and representation-dependent generalization bound. This approach transcends conventional bounds based on model size or norm products, offering representation-aware theoretical guarantees. Both theoretical analysis and empirical results demonstrate that the proposed bound is orders of magnitude tighter than current alternatives. Moreover, the Riemannian dimension decreases with increased over-parameterization, revealing a fundamental link between feature compression and the implicit bias of optimizers.
📝 Abstract
We address the fundamental question of why deep neural networks generalize by establishing a pointwise generalization theory for fully connected networks. This framework resolves long-standing barriers to characterizing the rich nonlinear feature-learning regime and builds a new statistical foundation for representation learning. For each trained model, we characterize the hypothesis via a pointwise Riemannian Dimension, derived from the eigenvalues of the learned feature representations across layers. This establishes a principled framework for deriving hypothesis-dependent, representation-aware generalization bounds. These bounds offer a systematic upgrade over approaches based on model size, products of norms, and infinite-width linearizations, yielding guarantees that are orders of magnitude tighter in both theory and experiment. Analytically, we identify the structural properties and mathematical principles that explain the tractability of deep networks. Empirically, the pointwise Riemannian Dimension exhibits substantial feature compression, decreases with increased over-parameterization, and captures the implicit bias of optimizers. Taken together, our results indicate that deep networks are mathematically tractable in practical regimes and that their generalization is sharply explained by pointwise, feature-spectrum-aware complexity.