🤖 AI Summary
Existing latent Bayesian optimization (LBO) methods for high-dimensional structured data—such as molecular sequences—suffer from severe performance degradation due to reconstruction errors in encoder-decoder mappings, causing misalignment between latent-space and input-space objective values. To address this, we propose FlowBO: (1) it employs invertible normalizing flows to construct exact, bijective encoder-decoder mappings, eliminating reconstruction bias entirely; (2) it introduces autoregressive normalizing flows (SeqFlow) into LBO for the first time, jointly modeling sequential structure and ensuring probabilistic invertibility; and (3) it designs a dynamic exploration sampling strategy based on token-wise importance to enhance query efficiency in the latent space. Evaluated on molecular generation tasks, FlowBO significantly outperforms both classical and state-of-the-art LBO approaches: optimization convergence accelerates by 32%–57%, while simultaneously improving both validity and diversity of generated molecules.
📝 Abstract
Bayesian Optimization (BO) has been recognized for its effectiveness in optimizing expensive and complex objective functions. Recent advancements in Latent Bayesian Optimization (LBO) have shown promise by integrating generative models such as variational autoencoders (VAEs) to manage the complexity of high-dimensional and structured data spaces. However, existing LBO approaches often suffer from the value discrepancy problem, which arises from the reconstruction gap between input and latent spaces. This value discrepancy problem propagates errors throughout the optimization process, leading to suboptimal outcomes. To address this issue, we propose a Normalizing Flow-based Bayesian Optimization (NF-BO), which utilizes normalizing flow as a generative model to establish one-to-one encoding function from the input space to the latent space, along with its left-inverse decoding function, eliminating the reconstruction gap. Specifically, we introduce SeqFlow, an autoregressive normalizing flow for sequence data. In addition, we develop a new candidate sampling strategy that dynamically adjusts the exploration probability for each token based on its importance. Through extensive experiments, our NF-BO method demonstrates superior performance in molecule generation tasks, significantly outperforming both traditional and recent LBO approaches.