Generalization Bounds for Rank-sparse Neural Networks

📅 2025-10-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates how the bottleneck rank phenomenon in neural network weight matrices affects generalization. Motivated by the empirically observed near-low-rank structure of activations and weights in deep models, we establish, for the first time, a theoretical connection between bottleneck rank and Schatten $p$-quasi-norm regularization. We propose a novel generalization analysis framework grounded in rank sparsity. Leveraging regularization theory under gradient descent for linear networks, we derive a generalization error bound that explicitly depends on the effective rank $r$, yielding a sample complexity upper bound of $widetilde{O}(WrL^2)$. When $p < 2$, this bound strictly improves upon classical bounds based on Frobenius or spectral norms, revealing the fundamental role of low-rank structure in enhancing generalization. Our analysis provides principled theoretical justification for designing rank-aware regularization methods.

Technology Category

Application Category

📝 Abstract
It has been recently observed in much of the literature that neural networks exhibit a bottleneck rank property: for larger depths, the activation and weights of neural networks trained with gradient-based methods tend to be of approximately low rank. In fact, the rank of the activations of each layer converges to a fixed value referred to as the ``bottleneck rank'', which is the minimum rank required to represent the training data. This perspective is in line with the observation that regularizing linear networks (without activations) with weight decay is equivalent to minimizing the Schatten $p$ quasi norm of the neural network. In this paper we investigate the implications of this phenomenon for generalization. More specifically, we prove generalization bounds for neural networks which exploit the approximate low rank structure of the weight matrices if present. The final results rely on the Schatten $p$ quasi norms of the weight matrices: for small $p$, the bounds exhibit a sample complexity $ widetilde{O}(WrL^2)$ where $W$ and $L$ are the width and depth of the neural network respectively and where $r$ is the rank of the weight matrices. As $p$ increases, the bound behaves more like a norm-based bound instead.
Problem

Research questions and friction points this paper is trying to address.

Prove generalization bounds for low-rank neural networks
Analyze Schatten p-quasi norm impact on sample complexity
Establish rank-dependent bounds for network width and depth
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalization bounds exploit low-rank weight matrices
Bounds use Schatten p quasi-norms of weights
Sample complexity scales with network width depth rank
🔎 Similar Papers
No similar papers found.