Learning Hyperparameters via a Data-Emphasized Variational Objective

📅 2025-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the computational expense, validation-set dependency, and manual enumeration of candidate configurations inherent in grid search for hyperparameter tuning of large-scale deep models, this paper proposes an end-to-end differentiable framework that jointly optimizes regularization hyperparameters and model parameters directly on the full training set. Its core contribution is the first incorporation of a data-weighted variational evidence lower bound (ELBo) into hyperparameter learning, enabling validation-free, enumeration-free, fully gradient-driven optimization. The framework is further extended to enable efficient learning of length-scale hyperparameters in Gaussian process kernels. Experiments on image transfer learning tasks demonstrate that the method reduces tuning time from over 88 hours (grid search) to under three hours while maintaining comparable accuracy. Moreover, it significantly improves both the computational efficiency and modeling flexibility of Gaussian process approximations.

Technology Category

Application Category

📝 Abstract
When training large flexible models, practitioners often rely on grid search to select hyperparameters that control over-fitting. This grid search has several disadvantages: the search is computationally expensive, requires carving out a validation set that reduces the available data for training, and requires users to specify candidate values. In this paper, we propose an alternative: directly learning regularization hyperparameters on the full training set via the evidence lower bound ("ELBo") objective from variational methods. For deep neural networks with millions of parameters, we recommend a modified ELBo that upweights the influence of the data likelihood relative to the prior. Our proposed technique overcomes all three disadvantages of grid search. In a case study on transfer learning of image classifiers, we show how our method reduces the 88+ hour grid search of past work to under 3 hours while delivering comparable accuracy. We further demonstrate how our approach enables efficient yet accurate approximations of Gaussian processes with learnable length-scale kernels.
Problem

Research questions and friction points this paper is trying to address.

Automates hyperparameter selection for large models
Reduces computational cost of grid search
Enhances data efficiency in training processes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Directly learns hyperparameters via ELBo
Modifies ELBo to emphasize data likelihood
Reduces grid search time significantly