BOFormer: Learning to Solve Multi-Objective Bayesian Optimization via Non-Markovian RL

๐Ÿ“… 2025-05-28
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Multi-objective Bayesian optimization (MOBO) suffers from identifiability issues in hypervolume estimation due to non-Markovian dependencies in the acquisition process. Method: This work pioneers modeling MOBO as a non-Markovian reinforcement learning problem and introduces the first sequence-based deep Q-learning framework for MOBO. It innovatively integrates a Transformer architecture to capture historical dependencies, Gaussian process surrogate models, an adaptive hypervolume-based reward function, and a non-Markovian RL policyโ€”thereby relaxing the restrictive Markov assumption inherent in conventional approaches. Contribution/Results: The proposed method achieves significant improvements over both rule-based and learning-based baselines on synthetic benchmarks and real-world multi-objective hyperparameter tuning tasks. To foster reproducibility and community advancement, the implementation is publicly released as open-source software.

Technology Category

Application Category

๐Ÿ“ Abstract
Bayesian optimization (BO) offers an efficient pipeline for optimizing black-box functions with the help of a Gaussian process prior and an acquisition function (AF). Recently, in the context of single-objective BO, learning-based AFs witnessed promising empirical results given its favorable non-myopic nature. Despite this, the direct extension of these approaches to multi-objective Bayesian optimization (MOBO) suffer from the extit{hypervolume identifiability issue}, which results from the non-Markovian nature of MOBO problems. To tackle this, inspired by the non-Markovian RL literature and the success of Transformers in language modeling, we present a generalized deep Q-learning framework and propose extit{BOFormer}, which substantiates this framework for MOBO via sequence modeling. Through extensive evaluation, we demonstrate that BOFormer constantly outperforms the benchmark rule-based and learning-based algorithms in various synthetic MOBO and real-world multi-objective hyperparameter optimization problems. We have made the source code publicly available to encourage further research in this direction.
Problem

Research questions and friction points this paper is trying to address.

Addressing hypervolume identifiability in multi-objective Bayesian optimization
Extending non-Markovian RL to solve MOBO via sequence modeling
Improving performance over rule-based and learning-based MOBO algorithms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Non-Markovian RL for multi-objective Bayesian optimization
Transformer-based sequence modeling for MOBO
Deep Q-learning framework for hyperparameter optimization
๐Ÿ”Ž Similar Papers
No similar papers found.