Overfitting and Generalizing with (PAC) Bayesian Prediction in Noisy Binary Classification

📅 2026-03-23

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This work addresses the overfitting issue of conventional Bayesian predictors in noisy binary classification, where insufficient regularization leads to significant excess risk. The authors propose a PAC-Bayes learning rule that controls regularization strength by balancing the training error of a randomized posterior predictor against its KL divergence from a prior. The key contribution lies in elucidating how the trade-off parameter λ governs generalization: when λ grows with the sample size (λ ≫ 1), employing a sample-size-dependent prior guarantees that the excess risk converges to zero uniformly under agnostic label noise. This result extends discrete-prior analyses to the continuous Bayesian setting and, through a refined two-part coding MDL perspective, quantitatively delineates the boundary between under- and over-regularization, thereby effectively mitigating the overfitting inherent in traditional Bayesian prediction.

Technology Category

Application Category

📝 Abstract

We consider a PAC-Bayes type learning rule for binary classification, balancing the training error of a randomized ''posterior'' predictor with its KL divergence to a pre-specified ''prior''. This can be seen as an extension of a modified two-part-code Minimum Description Length (MDL) learning rule, to continuous priors and randomized predictions. With a balancing parameter of $λ=1$ this learning rule recovers an (empirical) Bayes posterior and a modified variant recovers the profile posterior, linking with standard Bayesian prediction (up to the treatment of the single-parameter noise level). However, from a risk-minimization prediction perspective, this Bayesian predictor overfits and can lead to non-vanishing excess loss in the agnostic case. Instead a choice of $λ\gg 1$, which can be seen as using a sample-size-dependent-prior, ensures uniformly vanishing excess loss even in the agnostic case. We precisely characterize the effect of under-regularizing (and over-regularizing) as a function of the balance parameter $λ$, understanding the regimes in which this under-regularization is tempered or catastrophic. This work extends previous work by Zhu and Srebro [2025] that considered only discrete priors to PAC Bayes type learning rules and, through their rigorous Bayesian interpretation, to Bayesian prediction more generally.

Problem

Research questions and friction points this paper is trying to address.

overfitting

PAC-Bayes

binary classification

agnostic learning

excess loss

Innovation

Methods, ideas, or system contributions that make the work stand out.

PAC-Bayes

binary classification

overfitting