π€ AI Summary
In multi-objective molecular design, it remains unclear whether Pareto-aware Bayesian optimization (e.g., Expected Hypervolume Improvement, EHVI) outperforms scalarization-based approaches (e.g., Expected Improvement, EI), particularly under limited evaluation budgets and complex trade-offs. This work systematically compares EHVI and EI across three molecular optimization tasks, employing Gaussian process surrogate models and standard chemical benchmarks. Results demonstrate that EHVI significantly improves Pareto front coverage (+23β41%), accelerates convergence (reaching the convergence threshold on average 37% faster than EI), and enhances chemical diversity of generated molecules (reducing FrΓ©chet ChemNet Distance by 18%). These advantages are especially pronounced in low-data regimes. To our knowledge, this is the first empirical study in molecular design to rigorously establish the substantial gains of Pareto-aware strategies over scalarization, providing a more robust and practical paradigm for resource-constrained multi-objective de novo molecular optimization.
π Abstract
Multi-objective Bayesian optimization (MOBO) provides a principled framework for navigating trade-offs in molecular design. However, its empirical advantages over scalarized alternatives remain underexplored. We benchmark a simple Pareto-based MOBO strategy -- Expected Hypervolume Improvement (EHVI) -- against a simple fixed-weight scalarized baseline using Expected Improvement (EI), under a tightly controlled setup with identical Gaussian Process surrogates and molecular representations. Across three molecular optimization tasks, EHVI consistently outperforms scalarized EI in terms of Pareto front coverage, convergence speed, and chemical diversity. While scalarization encompasses flexible variants -- including random or adaptive schemes -- our results show that even strong deterministic instantiations can underperform in low-data regimes. These findings offer concrete evidence for the practical advantages of Pareto-aware acquisition in de novo molecular optimization, especially when evaluation budgets are limited and trade-offs are nontrivial.