๐ค AI Summary
Multi-objective Bayesian optimization (MOBO) suffers from identifiability issues in hypervolume estimation due to non-Markovian dependencies in the acquisition process. Method: This work pioneers modeling MOBO as a non-Markovian reinforcement learning problem and introduces the first sequence-based deep Q-learning framework for MOBO. It innovatively integrates a Transformer architecture to capture historical dependencies, Gaussian process surrogate models, an adaptive hypervolume-based reward function, and a non-Markovian RL policyโthereby relaxing the restrictive Markov assumption inherent in conventional approaches. Contribution/Results: The proposed method achieves significant improvements over both rule-based and learning-based baselines on synthetic benchmarks and real-world multi-objective hyperparameter tuning tasks. To foster reproducibility and community advancement, the implementation is publicly released as open-source software.
๐ Abstract
Bayesian optimization (BO) offers an efficient pipeline for optimizing black-box functions with the help of a Gaussian process prior and an acquisition function (AF). Recently, in the context of single-objective BO, learning-based AFs witnessed promising empirical results given its favorable non-myopic nature. Despite this, the direct extension of these approaches to multi-objective Bayesian optimization (MOBO) suffer from the extit{hypervolume identifiability issue}, which results from the non-Markovian nature of MOBO problems. To tackle this, inspired by the non-Markovian RL literature and the success of Transformers in language modeling, we present a generalized deep Q-learning framework and propose extit{BOFormer}, which substantiates this framework for MOBO via sequence modeling. Through extensive evaluation, we demonstrate that BOFormer constantly outperforms the benchmark rule-based and learning-based algorithms in various synthetic MOBO and real-world multi-objective hyperparameter optimization problems. We have made the source code publicly available to encourage further research in this direction.