🤖 AI Summary
This paper addresses the poor scalability and high computational cost of Bayesian optimization (BO) in high-dimensional mixed-categorical black-box optimization (MC-BBO). To this end, we propose an efficient probabilistic search framework based on natural gradient descent. Methodologically, we introduce the first joint parameterization that unifies multivariate Gaussian and categorical distributions; incorporate a natural gradient update rule augmented with covariance matrix adaptation (CMA-ES), adaptive step-size, and learning-rate scheduling; and design a constrained Softmax parameterization for categorical variables with analytically enforced boundary constraints. Our contributions are threefold: (i) significantly improved convergence speed and robustness for high-dimensional mixed-variable optimization involving continuous, integer, and unordered categorical variables; (ii) consistent outperformance over state-of-the-art BO methods across multiple benchmark tasks; and (iii) markedly slower performance degradation with increasing dimensionality.
📝 Abstract
Black-box optimization problems often require simultaneously optimizing different types of variables, such as continuous, integer, and categorical variables. Unlike integer variables, categorical variables do not necessarily have a meaningful order, and the discretization approach of continuous variables does not work well. Although several Bayesian optimization methods can deal with mixed-category black-box optimization (MC-BBO), they suffer from a lack of scalability to high-dimensional problems and internal computational cost. This paper proposes CatCMA, a stochastic optimization method for MC-BBO problems, which employs the joint probability distribution of multivariate Gaussian and categorical distributions as the search distribution. CatCMA updates the parameters of the joint probability distribution in the natural gradient direction. CatCMA also incorporates the acceleration techniques used in the covariance matrix adaptation evolution strategy (CMA-ES) and the stochastic natural gradient method, such as step-size adaptation and learning rate adaptation. In addition, we restrict the ranges of the categorical distribution parameters by margin to prevent premature convergence and analytically derive a promising margin setting. Numerical experiments show that the performance of CatCMA is superior and more robust to problem dimensions compared to state-of-the-art Bayesian optimization algorithms.