On-Average Stability of Multipass Preconditioned SGD and Effective Dimension

๐Ÿ“… 2026-03-12
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work investigates the impact of preconditioning strategies, loss curvature, and noise geometry on the generalization performance of multi-epoch preconditioned stochastic gradient descent (PSGD). To this end, it introduces the first average algorithmic stability framework tailored to multi-epoch SGD, overcoming challenges posed by data reuseโ€“induced dependencies. The analysis establishes a theoretical link between generalization error and an effective dimension that depends on the curvature of the loss landscape. The main contributions include deriving an upper bound on excess risk characterized by this effective dimension, revealing that ill-chosen preconditioners lead to suboptimal generalization, and providing a matching instance-dependent lower bound that confirms the tightness of the proposed guarantee.

Technology Category

Application Category

๐Ÿ“ Abstract
We study trade-offs between the population risk curvature, geometry of the noise, and preconditioning on the generalisation ability of the multipass Preconditioned Stochastic Gradient Descent (PSGD). Many practical optimisation heuristics implicitly navigate this trade-off in different ways -- for instance, some aim to whiten gradient noise, while others aim to align updates with expected loss curvature. When the geometry of the population risk curvature and the geometry of the gradient noise do not match, an aggressive choice that improves one aspect can amplify instability along the other, leading to suboptimal statistical behavior. In this paper we employ on-average algorithmic stability to connect generalisation of PSGD to the effective dimension that depends on these sources of curvature. While existing techniques for on-average stability of SGD are limited to a single pass, as first contribution we develop a new on-average stability analysis for multipass SGD that handles the correlations induced by data reuse. This allows us to derive excess risk bounds that depend on the effective dimension. In particular, we show that an improperly chosen preconditioner can yield suboptimal effective dimension dependence in both optimisation and generalisation. Finally, we complement our upper bounds with matching, instance-dependent lower bounds.
Problem

Research questions and friction points this paper is trying to address.

multipass PSGD
generalisation
on-average stability
effective dimension
preconditioning
Innovation

Methods, ideas, or system contributions that make the work stand out.

on-average stability
multipass SGD
preconditioning
effective dimension
generalization bounds
๐Ÿ”Ž Similar Papers
No similar papers found.