🤖 AI Summary
This study investigates whether default hyperparameters in machine learning libraries serve as effective initial points for Bayesian optimization to accelerate convergence. The authors conduct the first large-scale empirical evaluation by initializing optimization with samples drawn from a truncated Gaussian distribution centered around default values and comparing this strategy against uniform random initialization. Experiments span three optimization frameworks—BoTorch, Optuna, and Scikit-Optimize—combined with Random Forest, SVM, and MLP models across five standard datasets. Results show that default hyperparameters do not yield statistically significant performance improvements (p = 0.141–0.908), and any early advantage they confer dissipates as optimization progresses. These findings suggest that default values lack informative prior knowledge, challenging the common heuristic of using them as starting points in hyperparameter optimization.
📝 Abstract
Bayesian Optimization (BO) is a standard tool for hyperparameter tuning thanks to its sample efficiency on expensive black-box functions. While most BO pipelines begin with uniform random initialization, default hyperparameter values shipped with popular ML libraries such as scikit-learn encode implicit expert knowledge and could serve as informative starting points that accelerate convergence. This hypothesis, despite its intuitive appeal, has remained largely unexamined. We formalize the idea by initializing BO with points drawn from truncated Gaussian distributions centered at library defaults and compare the resulting trajectories against a uniform-random baseline. We conduct an extensive empirical evaluation spanning three BO back-ends (BoTorch, Optuna, Scikit-Optimize), three model families (Random Forests, Support Vector Machines, Multilayer Perceptrons), and five benchmark datasets covering classification and regression tasks. Performance is assessed through convergence speed and final predictive quality, and statistical significance is determined via one-sided binomial tests. Across all conditions, default-informed initialization yields no statistically significant advantage over purely random sampling, with p-values ranging from 0.141 to 0.908. A sensitivity analysis on the prior variance confirms that, while tighter concentration around the defaults improves early evaluations, this transient benefit vanishes as optimization progresses, leaving final performance unchanged. Our results provide no evidence that default hyperparameters encode useful directional information for optimization. We therefore recommend that practitioners treat hyperparameter tuning as an integral part of model development and favor principled, data-driven search strategies over heuristic reliance on library defaults.