BOFormer: Learning to Solve Multi-Objective Bayesian Optimization via Non-Markovian RL

📅 2025-05-28

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Multi-objective Bayesian optimization (MOBO) suffers from identifiability issues in hypervolume estimation due to non-Markovian dependencies in the acquisition process. Method: This work pioneers modeling MOBO as a non-Markovian reinforcement learning problem and introduces the first sequence-based deep Q-learning framework for MOBO. It innovatively integrates a Transformer architecture to capture historical dependencies, Gaussian process surrogate models, an adaptive hypervolume-based reward function, and a non-Markovian RL policy—thereby relaxing the restrictive Markov assumption inherent in conventional approaches. Contribution/Results: The proposed method achieves significant improvements over both rule-based and learning-based baselines on synthetic benchmarks and real-world multi-objective hyperparameter tuning tasks. To foster reproducibility and community advancement, the implementation is publicly released as open-source software.

Technology Category

Application Category

📝 Abstract

Bayesian optimization (BO) offers an efficient pipeline for optimizing black-box functions with the help of a Gaussian process prior and an acquisition function (AF). Recently, in the context of single-objective BO, learning-based AFs witnessed promising empirical results given its favorable non-myopic nature. Despite this, the direct extension of these approaches to multi-objective Bayesian optimization (MOBO) suffer from the extit{hypervolume identifiability issue}, which results from the non-Markovian nature of MOBO problems. To tackle this, inspired by the non-Markovian RL literature and the success of Transformers in language modeling, we present a generalized deep Q-learning framework and propose extit{BOFormer}, which substantiates this framework for MOBO via sequence modeling. Through extensive evaluation, we demonstrate that BOFormer constantly outperforms the benchmark rule-based and learning-based algorithms in various synthetic MOBO and real-world multi-objective hyperparameter optimization problems. We have made the source code publicly available to encourage further research in this direction.

Problem

Research questions and friction points this paper is trying to address.

Addressing hypervolume identifiability in multi-objective Bayesian optimization

Extending non-Markovian RL to solve MOBO via sequence modeling

Improving performance over rule-based and learning-based MOBO algorithms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Non-Markovian RL for multi-objective Bayesian optimization

Transformer-based sequence modeling for MOBO

Deep Q-learning framework for hyperparameter optimization

🔎 Similar Papers

No similar papers found.