What do Large Language Models know about materials?

📅 2025-07-19

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Large language models (LLMs) exhibit insufficient knowledge reliability for materials science applications, particularly within the processing–structure–property–performance (PSPP) paradigm. Method: We systematically evaluate LLMs’ reasoning capabilities across PSPP stages and introduce MatKB—the first lightweight, domain-specific knowledge benchmark for materials science—covering fundamental factual tasks including periodic table knowledge, phase diagram fundamentals, and thermodynamic relationships. We analyze tokenizer and vocabulary impacts on material entity disambiguation and employ structured prompting with multi-source factual verification to quantify performance of leading open-weight models (Llama-3, Qwen2, Phi-3). Contribution/Results: Results reveal substantial factual inaccuracies: accuracy in structure–property mapping falls below 40%, and generic LLMs consistently underperform domain-specific tools. Tokenizer design critically affects material entity representation fidelity. Our findings underscore the necessity of knowledge augmentation or domain adaptation—providing an empirical benchmark and methodological framework to guide LLM selection and specialization for engineering applications.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are increasingly applied in the fields of mechanical engineering and materials science. As models that establish connections through the interface of language, LLMs can be applied for step-wise reasoning through the Processing-Structure-Property-Performance chain of material science and engineering. Current LLMs are built for adequately representing a dataset, which is the most part of the accessible internet. However, the internet mostly contains non-scientific content. If LLMs should be applied for engineering purposes, it is valuable to investigate models for their intrinsic knowledge -- here: the capacity to generate correct information about materials. In the current work, for the example of the Periodic Table of Elements, we highlight the role of vocabulary and tokenization for the uniqueness of material fingerprints, and the LLMs' capabilities of generating factually correct output of different state-of-the-art open models. This leads to a material knowledge benchmark for an informed choice, for which steps in the PSPP chain LLMs are applicable, and where specialized models are required.

Problem

Research questions and friction points this paper is trying to address.

Assessing LLMs' knowledge of materials science concepts

Evaluating LLMs' accuracy in generating material-related information

Identifying gaps in LLMs' applicability for engineering tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs analyze materials via Processing-Structure-Property-Performance chain

Benchmark evaluates LLMs' factual accuracy on material science

Tokenization impacts material fingerprint uniqueness in LLMs

🔎 Similar Papers

MatText: Do Language Models Need More than Text & Scale for Materials Modeling?