Generalization Bounds for Rank-sparse Neural Networks

📅 2025-10-24

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

This work investigates how the bottleneck rank phenomenon in neural network weight matrices affects generalization. Motivated by the empirically observed near-low-rank structure of activations and weights in deep models, we establish, for the first time, a theoretical connection between bottleneck rank and Schatten $p$-quasi-norm regularization. We propose a novel generalization analysis framework grounded in rank sparsity. Leveraging regularization theory under gradient descent for linear networks, we derive a generalization error bound that explicitly depends on the effective rank $r$, yielding a sample complexity upper bound of $widetilde{O}(WrL^2)$. When $p < 2$, this bound strictly improves upon classical bounds based on Frobenius or spectral norms, revealing the fundamental role of low-rank structure in enhancing generalization. Our analysis provides principled theoretical justification for designing rank-aware regularization methods.

Technology Category

Application Category

📝 Abstract

It has been recently observed in much of the literature that neural networks exhibit a bottleneck rank property: for larger depths, the activation and weights of neural networks trained with gradient-based methods tend to be of approximately low rank. In fact, the rank of the activations of each layer converges to a fixed value referred to as the ``bottleneck rank'', which is the minimum rank required to represent the training data. This perspective is in line with the observation that regularizing linear networks (without activations) with weight decay is equivalent to minimizing the Schatten $p$ quasi norm of the neural network. In this paper we investigate the implications of this phenomenon for generalization. More specifically, we prove generalization bounds for neural networks which exploit the approximate low rank structure of the weight matrices if present. The final results rely on the Schatten $p$ quasi norms of the weight matrices: for small $p$, the bounds exhibit a sample complexity $ widetilde{O}(WrL^2)$ where $W$ and $L$ are the width and depth of the neural network respectively and where $r$ is the rank of the weight matrices. As $p$ increases, the bound behaves more like a norm-based bound instead.

Problem

Research questions and friction points this paper is trying to address.

Prove generalization bounds for low-rank neural networks

Analyze Schatten p-quasi norm impact on sample complexity

Establish rank-dependent bounds for network width and depth

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalization bounds exploit low-rank weight matrices

Bounds use Schatten p quasi-norms of weights

Sample complexity scales with network width depth rank

🔎 Similar Papers

No similar papers found.