🤖 AI Summary
Bayesian hyperparameter optimization (HPO) in large-scale machine learning suffers from high computational cost and difficulty in posterior sampling. To address this, we propose a generative hyperparameter tuning framework: first, we employ weighted Bayesian bootstrapping to efficiently approximate the hyperparameter posterior distribution; second, we learn a transport mapping from hyperparameters to optimizers, yielding a lookup-table-based generative estimator. This work is the first to introduce generative modeling into Bayesian HPO, enabling amortized optimization over the hyperparameter space. The method supports millisecond-scale evaluation and uncertainty quantification for both continuous and discrete hyperparameter grids. Experiments demonstrate a 10–100× speedup in search time over conventional Bayesian optimization, while achieving superior generalization performance and well-calibrated uncertainty estimates across multiple benchmarks.
📝 Abstract
oindent Hyper-parameter selection is a central practical problem in modern machine learning, governing regularization strength, model capacity, and robustness choices. Cross-validation is often computationally prohibitive at scale, while fully Bayesian hyper-parameter learning can be difficult due to the cost of posterior sampling. We develop a generative perspective on hyper-parameter tuning that combines two ideas: (i) optimization-based approximations to Bayesian posteriors via randomized, weighted objectives (weighted Bayesian bootstrap), and (ii) amortization of repeated optimization across many hyper-parameter settings by learning a transport map from hyper-parameters (including random weights) to the corresponding optimizer. This yields a ``generator look-up table'' for estimators, enabling rapid evaluation over grids or continuous ranges of hyper-parameters and supporting both predictive tuning objectives and approximate Bayesian uncertainty quantification. We connect this viewpoint to weighted $M$-estimation, envelope/auxiliary-variable representations that reduce non-quadratic losses to weighted least squares, and recent generative samplers for weighted $M$-estimators.