Sufficient Conditions for Stability of Minimum-Norm Interpolating Deep ReLU Networks

📅 2026-02-14

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This study investigates the algorithmic stability of over-parameterized deep ReLU homogeneous networks when trained to achieve zero training error via the minimum L²-norm interpolating solution. By integrating algorithmic stability theory, minimum-norm interpolation analysis, and structural properties of neural networks, the work identifies key conditions for stability: robustness to small perturbations in the training data is guaranteed if the network contains a stable subnetwork followed by a low-rank weight layer; conversely, non-low-rank layers may induce instability. These findings offer new theoretical insights into the generalization behavior of minimum-norm interpolating deep networks, shedding light on how architectural choices influence stability and, consequently, generalization in highly over-parameterized regimes.

Technology Category

Application Category

📝 Abstract

Algorithmic stability is a classical framework for analyzing the generalization error of learning algorithms. It predicts that an algorithm has small generalization error if it is insensitive to small perturbations in the training set such as the removal or replacement of a training point. While stability has been demonstrated for numerous well-known algorithms, this framework has had limited success in analyses of deep neural networks. In this paper we study the algorithmic stability of deep ReLU homogeneous neural networks that achieve zero training error using parameters with the smallest $L_2$ norm, also known as the minimum-norm interpolation, a phenomenon that can be observed in overparameterized models trained by gradient-based algorithms. We investigate sufficient conditions for such networks to be stable. We find that 1) such networks are stable when they contain a (possibly small) stable sub-network, followed by a layer with a low-rank weight matrix, and 2) such networks are not guaranteed to be stable even when they contain a stable sub-network, if the following layer is not low-rank. The low-rank assumption is inspired by recent empirical and theoretical results which demonstrate that training deep neural networks is biased towards low-rank weight matrices, for minimum-norm interpolation and weight-decay regularization.

Problem

Research questions and friction points this paper is trying to address.

algorithmic stability

minimum-norm interpolation

deep ReLU networks

generalization error

low-rank weight matrices

Innovation

Methods, ideas, or system contributions that make the work stand out.

algorithmic stability

minimum-norm interpolation

deep ReLU networks