🤖 AI Summary
Inorganic materials synthesis planning has long been constrained by heuristic expertise and the limited generalizability of small-scale, data-driven models. To address this, we pioneer the use of general-purpose large language models (e.g., GPT-4.1, Gemini 2.0 Flash) for zero-shot synthesis condition prediction—directly inferring precursor combinations and thermal treatment temperatures without task-specific fine-tuning. We further propose a novel pretraining paradigm integrating LLM-generated synthetic data with curated literature data, yielding the SyntMTE model. With only minimal labeled data, SyntMTE achieves a zero-shot Top-5 precursor prediction accuracy of 66.1%, and reduces mean absolute error (MAE) in sintering and calcination temperature prediction to 73°C and 98°C, respectively—improving upon baselines by 8.7%. Critically, it successfully reproduces dopant-dependent sintering trends for LLZO electrolytes, markedly enhancing both predictive accuracy and cross-system generalizability in synthesis pathway planning.
📝 Abstract
Inorganic synthesis planning currently relies primarily on heuristic approaches or machine-learning models trained on limited datasets, which constrains its generality. We demonstrate that language models, without task-specific fine-tuning, can recall synthesis conditions. Off-the-shelf models, such as GPT-4.1, Gemini 2.0 Flash and Llama 4 Maverick, achieve a Top-1 precursor-prediction accuracy of up to 53.8 % and a Top-5 performance of 66.1 % on a held-out set of 1,000 reactions. They also predict calcination and sintering temperatures with mean absolute errors below 126 {deg}C, matching specialized regression methods. Ensembling these language models further enhances predictive accuracy and reduces inference cost per prediction by up to 70 %. We subsequently employ language models to generate 28,548 synthetic reaction recipes, which we combine with literature-mined examples to pretrain a transformer-based model, SyntMTE. After fine-tuning on the combined dataset, SyntMTE reduces mean-absolute error in sintering temperature prediction to 73 {deg}C and in calcination temperature to 98 {deg}C. This strategy improves models by up to 8.7 % compared with baselines trained exclusively on experimental data. Finally, in a case study on Li7La3Zr2O12 solid-state electrolytes, we demonstrate that SyntMTE reproduces the experimentally observed dopant-dependent sintering trends. Our hybrid workflow enables scalable, data-efficient inorganic synthesis planning.