Bayesian Optimization of Catalysis With In-Context Learning

📅 2023-04-11
📈 Citations: 30
Influential: 1
📄 PDF
🤖 AI Summary
Catalyst discovery faces bottlenecks in inefficient characterization of suboptimal materials and the reliance of conventional Bayesian optimization (BO) on smoothness and continuity assumptions, alongside manual design of structural/electronic descriptors. Method: We propose Language-Space Bayesian Optimization (LS-BO), the first framework integrating frozen large language models (e.g., GPT-3.5, Gemini) with uncertainty-aware in-context learning—requiring neither model training nor feature engineering—to directly model highly nonlinear, heterogeneous catalytic responses in natural language space. Contribution/Results: LS-BO circumvents implicit continuity/smoothness assumptions of surrogate models, enabling interpretable, actionable predictions from unstructured inputs. It matches or surpasses Gaussian process performance on oxidative coupling of methane (OCM) and aqueous-phase benchmark tasks. In real-world reverse water-gas shift (RWGS) experiments, LS-BO identified near-optimal multimetallic catalysts within just six iterations from a candidate pool of 3,700.
📝 Abstract
Large language models (LLMs) can perform accurate classification with zero or few examples through in-context learning. We extend this capability to regression with uncertainty estimation using frozen LLMs (e.g., GPT-3.5, Gemini), enabling Bayesian optimization (BO) in natural language without explicit model training or feature engineering. We apply this to materials discovery by representing experimental catalyst synthesis and testing procedures as natural language prompts. A key challenge in materials discovery is the need to characterize suboptimal candidates, which slows progress. While BO is effective for navigating large design spaces, standard surrogate models like Gaussian processes assume smoothness and continuity, an assumption that fails in highly non-linear domains such as heterogeneous catalysis. Our task-agnostic BO workflow overcomes this by operating directly in language space, producing interpretable and actionable predictions without requiring structural or electronic descriptors. On benchmarks like aqueous solubility and oxidative coupling of methane (OCM), BO-ICL matches or outperforms Gaussian processes. In live experiments on the reverse water-gas shift (RWGS) reaction, BO-ICL identifies near-optimal multi-metallic catalysts within six iterations from a pool of 3,700 candidates. Our method redefines materials representation and accelerates discovery, with broad applications across catalysis, materials science, and AI. Code: https://github.com/ur-whitelab/BO-ICL.
Problem

Research questions and friction points this paper is trying to address.

Optimizing catalyst discovery using Bayesian optimization with language models
Overcoming limitations of Gaussian processes in non-linear catalysis domains
Accelerating materials discovery without explicit training or feature engineering
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses frozen LLMs for Bayesian optimization
Applies in-context learning to materials discovery
Operates directly in language space
🔎 Similar Papers
No similar papers found.
M
M. C. Ramos
Department of Chemical Engineering, University of Rochester
S
Shane S. Michtavy
Department of Chemical Engineering, University of Rochester
M
Marc D. Porosoff
Department of Chemical Engineering, University of Rochester
Andrew D. White
Andrew D. White
FutureHouse, University of Rochester
AI Scientist