Bayesian inference for the learning rate in Generalised Bayesian inference

📅 2025-06-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In generalized Bayesian inference (GBI), inference hyperparameters—particularly the learning rate—are challenging to jointly estimate with model parameters, and existing approaches lack principled uncertainty quantification. Method: We propose the first hold-out-data-based Bayesian learning rate inference framework. Theoretically, we derive the Bayesian posterior over the learning rate and establish its asymptotic convergence to the optimal value under mild regularity conditions. Methodologically, we treat the learning rate as a random variable and perform joint posterior inference over both the learning rate and model parameters via generalized Bayesian updating, supporting modular loss modeling and full uncertainty quantification. Results: Experiments demonstrate that our framework significantly outperforms standard Bayesian inference on synthetic data; automatically selects optimal or near-optimal learning rates in large-scale text analysis tasks; and improves predictive performance in multi-dataset fusion settings.

Technology Category

Application Category

📝 Abstract
In Generalised Bayesian Inference (GBI), the learning rate and hyperparameters of the loss must be estimated. However, these inference-hyperparameters can't be estimated jointly with the other parameters by giving them a prior, as we discuss. Several methods for estimating the learning rate have been given which elicit and minimise a loss based on the goals of the overall inference (in our case, prediction of new data). However, in some settings there exists an unknown ``true'' learning rate about which it is meaningful to have prior belief and it is then possible to use Bayesian inference with held out data to get a posterior for the learning rate. We give conditions under which this posterior concentrates on the optimal rate and suggest hyperparameter estimators derived from this posterior. The new framework supports joint estimation and uncertainty quatification for inference hyperparameters. Experiments show that the resulting GBI-posteriors out-perform Bayesian inference on simulated test data and select optimal or near optimal hyperparameter values in a large real problem of text analysis. Generalised Bayesian inference is particularly useful for combining multiple data sets and most of our examples belong to that setting. As a side note we give asymptotic results for some of the special ``multi-modular'' Generalised Bayes posteriors, which we use in our examples.
Problem

Research questions and friction points this paper is trying to address.

Estimating learning rate in Generalised Bayesian Inference
Bayesian inference for hyperparameters with held-out data
Joint estimation and uncertainty quantification for hyperparameters
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian inference for learning rate estimation
Posterior concentration on optimal learning rate
Joint hyperparameter estimation and uncertainty quantification
🔎 Similar Papers
No similar papers found.
J
Jeong Eun Lee
Department of Statistics, University of Auckland, Auckland, NZ
Sitong Liu
Sitong Liu
Duke University
G
Geoff K. Nicholls
Department of Statistics, University of Oxford, Oxford, UK