Bayesian Optimization of Catalysis With In-Context Learning

📅 2023-04-11

📈 Citations: 30

✨ Influential: 1

career value

217K/year

🤖 AI Summary

Catalyst discovery faces bottlenecks in inefficient characterization of suboptimal materials and the reliance of conventional Bayesian optimization (BO) on smoothness and continuity assumptions, alongside manual design of structural/electronic descriptors. Method: We propose Language-Space Bayesian Optimization (LS-BO), the first framework integrating frozen large language models (e.g., GPT-3.5, Gemini) with uncertainty-aware in-context learning—requiring neither model training nor feature engineering—to directly model highly nonlinear, heterogeneous catalytic responses in natural language space. Contribution/Results: LS-BO circumvents implicit continuity/smoothness assumptions of surrogate models, enabling interpretable, actionable predictions from unstructured inputs. It matches or surpasses Gaussian process performance on oxidative coupling of methane (OCM) and aqueous-phase benchmark tasks. In real-world reverse water-gas shift (RWGS) experiments, LS-BO identified near-optimal multimetallic catalysts within just six iterations from a candidate pool of 3,700.

📝 Abstract

Large language models (LLMs) can perform accurate classification with zero or few examples through in-context learning. We extend this capability to regression with uncertainty estimation using frozen LLMs (e.g., GPT-3.5, Gemini), enabling Bayesian optimization (BO) in natural language without explicit model training or feature engineering. We apply this to materials discovery by representing experimental catalyst synthesis and testing procedures as natural language prompts. A key challenge in materials discovery is the need to characterize suboptimal candidates, which slows progress. While BO is effective for navigating large design spaces, standard surrogate models like Gaussian processes assume smoothness and continuity, an assumption that fails in highly non-linear domains such as heterogeneous catalysis. Our task-agnostic BO workflow overcomes this by operating directly in language space, producing interpretable and actionable predictions without requiring structural or electronic descriptors. On benchmarks like aqueous solubility and oxidative coupling of methane (OCM), BO-ICL matches or outperforms Gaussian processes. In live experiments on the reverse water-gas shift (RWGS) reaction, BO-ICL identifies near-optimal multi-metallic catalysts within six iterations from a pool of 3,700 candidates. Our method redefines materials representation and accelerates discovery, with broad applications across catalysis, materials science, and AI. Code: https://github.com/ur-whitelab/BO-ICL.

Problem

Research questions and friction points this paper is trying to address.

Optimizing catalyst discovery using Bayesian optimization with language models

Overcoming limitations of Gaussian processes in non-linear catalysis domains

Accelerating materials discovery without explicit training or feature engineering

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses frozen LLMs for Bayesian optimization

Applies in-context learning to materials discovery

Operates directly in language space

🔎 Similar Papers

LICO: Large Language Models for In-Context Molecular Optimization