Evaluating the Performance and Robustness of LLMs in Materials Science Q&A and Property Predictions

📅 2024-09-22
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically evaluates the performance and robustness of large language models (LLMs) in materials science tasks—including question answering and property prediction (course-style questions, steel yield strength, bandgap)—where reliability is critical. Method: We employ zero-shot chain-of-thought prompting, expert-informed prompts, and few-shot learning, augmented with structured noise injection and adversarial perturbations to conduct multi-dimensional attribution analysis. Contribution/Results: We identify, for the first time, a “mode collapse” phenomenon in LLM-based materials prediction and an anomalous accuracy gain under train-test distribution mismatch—revealing intrinsic accuracy decay boundaries and robustness limitations. Based on these findings, we propose a novel robustness evaluation paradigm tailored to scientific reliability and establish a reproducible benchmarking framework. This work provides theoretical foundations and actionable risk alerts for the trustworthy deployment of scientific AI.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have the potential to revolutionize scientific research, yet their robustness and reliability in domain-specific applications remain insufficiently explored. This study conducts a comprehensive evaluation and robustness analysis of LLMs within the field of materials science, focusing on domain-specific question answering and materials property prediction. Three distinct datasets are used in this study: 1) a set of multiple-choice questions from undergraduate-level materials science courses, 2) a dataset including various steel compositions and yield strengths, and 3) a band gap dataset, containing textual descriptions of material crystal structures and band gap values. The performance of LLMs is assessed using various prompting strategies, including zero-shot chain-of-thought, expert prompting, and few-shot in-context learning. The robustness of these models is tested against various forms of 'noise', ranging from realistic disturbances to intentionally adversarial manipulations, to evaluate their resilience and reliability under real-world conditions. Additionally, the study uncovers unique phenomena of LLMs during predictive tasks, such as mode collapse behavior when the proximity of prompt examples is altered and performance enhancement from train/test mismatch. The findings aim to provide informed skepticism for the broad use of LLMs in materials science and to inspire advancements that enhance their robustness and reliability for practical applications.
Problem

Research questions and friction points this paper is trying to address.

Evaluate LLM performance in materials science Q&A.
Assess LLM robustness in property prediction tasks.
Test LLMs under diverse and adversarial conditions.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluated LLMs with multiple prompting strategies
Tested robustness against realistic and adversarial noise
Analyzed unique phenomena like mode collapse behavior
🔎 Similar Papers
No similar papers found.
H
Hongchen Wang
Department of Materials Science and Engineering, University of Toronto, Toronto, M5S 1A1, Canada
Kangming Li
Kangming Li
Assistant Professor at King Abdullah University of Science and Technology (KAUST)
Materials informaticsfirst principles calculationsmachine learning
S
Scott Ramsay
Department of Materials Science and Engineering, University of Toronto, Toronto, M5S 1A1, Canada
Yao Fehlis
Yao Fehlis
Artificial, Inc.
Computational ChemistryPlasmonicsMachine Learning
E
Edward Kim
Department of Materials Science and Engineering, University of Toronto, Toronto, M5S 1A1, Canada
Jason Hattrick-Simpers
Jason Hattrick-Simpers
Department of Materials Science and Engineering University of Toronto
artificial intelligenceautonomous sciencecombinatorial materials sciencecompositionally complex alloysmetallic glasses