Generalisation under gradient descent via deterministic PAC-Bayes

📅 2022-09-06
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
PAC-Bayesian generalization theory has long relied on stochastic optimization assumptions, rendering it inapplicable to deterministic training algorithms such as gradient descent (GD), momentum methods, and damped Hamiltonian dynamics. Method: We propose the first computable, randomization-free PAC-Bayes bound for deterministic gradient-based optimizers. Our approach integrates gradient flow modeling, Hessian trajectory analysis along the optimization path, and precise characterization of the initial parameter distribution’s density. This yields an explicit, fully evaluable upper bound on generalization error, depending solely on the initial density and the spectral properties of the Hessian along the training trajectory—without introducing auxiliary randomness or approximations. Results: Empirical evaluation across diverse deterministic and stochastic optimizers validates both the tightness and practical utility of the bound. Crucially, our work breaks the implicit stochasticity requirement of classical PAC-Bayes theory, establishing the first rigorous, computable Bayesian generalization guarantee for deterministic training.
📝 Abstract
We establish disintegrated PAC-Bayesian generalisation bounds for models trained with gradient descent methods or continuous gradient flows. Contrary to standard practice in the PAC-Bayesian setting, our result applies to optimisation algorithms that are deterministic, without requiring any de-randomisation step. Our bounds are fully computable, depending on the density of the initial distribution and the Hessian of the training objective over the trajectory. We show that our framework can be applied to a variety of iterative optimisation algorithms, including stochastic gradient descent (SGD), momentum-based schemes, and damped Hamiltonian dynamics.
Problem

Research questions and friction points this paper is trying to address.

Generalisation bounds for gradient descent
Deterministic optimisation without de-randomisation
Applicable to various iterative algorithms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deterministic PAC-Bayesian bounds
Computable gradient descent metrics
Applicable to various optimization algorithms
🔎 Similar Papers
No similar papers found.