🤖 AI Summary
The PAC-Bayes theory lacks systematic, accessible introductory resources for newcomers.
Method: This paper introduces a pedagogical framework for beginners: (i) it unifies and clarifies the equivalence between Catoni’s localization technique and mutual information bounds, proposing a simpler, reproducible derivation paradigm; (ii) it explicitly positions PAC-Bayes as a theoretical bridge connecting Bayesian inference and frequentist generalization analysis; and (iii) it integrates core methodological components—including distributional modeling, KL-divergence constraints, randomized predictor construction, and information-theoretic bound derivation—into a coherent, understandable, and reproducible teaching system.
Contribution/Results: The work fills a critical gap in foundational PAC-Bayes pedagogy and provides both theoretical grounding and practical methodology for applications such as posterior compression of neural networks in deep learning. It is poised to become a standard introductory reference for the PAC-Bayes community.
📝 Abstract
Aggregated predictors are obtained by making a set of basic predictors vote according to some weights, that is, to some probability distribution. Randomized predictors are obtained by sampling in a set of basic predictors, according to some prescribed probability distribution. Thus, aggregated and randomized predictors have in common that they are not defined by a minimization problem, but by a probability distribution on the set of predictors. In statistical learning theory, there is a set of tools designed to understand the generalization ability of such procedures: PAC-Bayesian or PAC-Bayes bounds. Since the original PAC-Bayes bounds of D. McAllester, these tools have been considerably improved in many directions (we will for example describe a simplified version of the localization technique of O. Catoni that was missed by the community, and later rediscovered as"mutual information bounds"). Very recently, PAC-Bayes bounds received a considerable attention: for example there was workshop on PAC-Bayes at NIPS 2017,"(Almost) 50 Shades of Bayesian Learning: PAC-Bayesian trends and insights", organized by B. Guedj, F. Bach and P. Germain. One of the reason of this recent success is the successful application of these bounds to neural networks by G. Dziugaite and D. Roy. An elementary introduction to PAC-Bayes theory is still missing. This is an attempt to provide such an introduction.