Large language models in materials science and the need for open-source approaches

📅 2025-11-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Material science research suffers from overreliance on proprietary large language models (LLMs), resulting in low transparency, poor reproducibility, high privacy risks, and prohibitive costs. To address this, we present the first open-source, LLM-driven framework tailored to the end-to-end materials discovery pipeline—spanning literature mining, structure–property modeling, and multi-agent experimental闭环. Our framework integrates text information extraction, graph neural network–enhanced relational learning, and multi-agent collaborative decision-making, while seamlessly interfacing with computational tools and laboratory automation systems. Through systematic benchmarking across diverse materials science tasks, leading open-weight models achieve performance on par with closed-source counterparts such as GPT-4. This work significantly enhances scientific reproducibility and data sovereignty, lowers barriers to AI adoption, and establishes a foundational paradigm and infrastructure for community-driven, transparent AI for Science platforms.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) are rapidly transforming materials science. This review examines recent LLM applications across the materials discovery pipeline, focusing on three key areas: mining scientific literature , predictive modelling, and multi-agent experimental systems. We highlight how LLMs extract valuable information such as synthesis conditions from text, learn structure-property relationships, and can coordinate agentic systems integrating computational tools and laboratory automation. While progress has been largely dependent on closed-source commercial models, our benchmark results demonstrate that open-source alternatives can match performance while offering greater transparency, reproducibility, cost-effectiveness, and data privacy. As open-source models continue to improve, we advocate their broader adoption to build accessible, flexible, and community-driven AI platforms for scientific discovery.
Problem

Research questions and friction points this paper is trying to address.

Extracting synthesis conditions and scientific information from text data
Learning structure-property relationships through predictive modeling approaches
Coordinating multi-agent systems integrating computational tools and automation
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs extract synthesis conditions from text
LLMs learn structure-property relationships for predictions
Open-source models coordinate multi-agent experimental systems
🔎 Similar Papers
No similar papers found.
F
Fengxu Yang
School of Physics, Chemistry and Earth Sciences, The University of Adelaide, Adelaide 5005, Australia
Weitong Chen
Weitong Chen
The University of Adelaide
Data MiningMachine LearningHealth Data Analysis
J
Jack D. Evans
School of Physics, Chemistry and Earth Sciences, The University of Adelaide, Adelaide 5005, Australia