Large language models in materials science and the need for open-source approaches

📅 2025-11-10

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

Material science research suffers from overreliance on proprietary large language models (LLMs), resulting in low transparency, poor reproducibility, high privacy risks, and prohibitive costs. To address this, we present the first open-source, LLM-driven framework tailored to the end-to-end materials discovery pipeline—spanning literature mining, structure–property modeling, and multi-agent experimental闭环. Our framework integrates text information extraction, graph neural network–enhanced relational learning, and multi-agent collaborative decision-making, while seamlessly interfacing with computational tools and laboratory automation systems. Through systematic benchmarking across diverse materials science tasks, leading open-weight models achieve performance on par with closed-source counterparts such as GPT-4. This work significantly enhances scientific reproducibility and data sovereignty, lowers barriers to AI adoption, and establishes a foundational paradigm and infrastructure for community-driven, transparent AI for Science platforms.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) are rapidly transforming materials science. This review examines recent LLM applications across the materials discovery pipeline, focusing on three key areas: mining scientific literature , predictive modelling, and multi-agent experimental systems. We highlight how LLMs extract valuable information such as synthesis conditions from text, learn structure-property relationships, and can coordinate agentic systems integrating computational tools and laboratory automation. While progress has been largely dependent on closed-source commercial models, our benchmark results demonstrate that open-source alternatives can match performance while offering greater transparency, reproducibility, cost-effectiveness, and data privacy. As open-source models continue to improve, we advocate their broader adoption to build accessible, flexible, and community-driven AI platforms for scientific discovery.

Problem

Research questions and friction points this paper is trying to address.

Extracting synthesis conditions and scientific information from text data

Learning structure-property relationships through predictive modeling approaches

Coordinating multi-agent systems integrating computational tools and automation

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs extract synthesis conditions from text

LLMs learn structure-property relationships for predictions

Open-source models coordinate multi-agent experimental systems

🔎 Similar Papers

MatText: Do Language Models Need More than Text & Scale for Materials Modeling?

2024-06-25arXiv.orgCitations: 10

Construction and Application of Materials Knowledge Graph in Multidisciplinary Materials Science via Large Language Model

2024-04-03Neural Information Processing SystemsCitations: 1