DARWIN 1.5: Large Language Models as Materials Science Adapted Learners

📅 2024-12-16

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

240K/year

🤖 AI Summary

Conventional approaches to materials property prediction suffer from poor generalizability, heavy reliance on handcrafted features, and inability to model complex, high-dimensional materials property spaces. Method: This work introduces MatLLM—the first large-scale, open-source language model tailored for materials science—built upon the LLaMA-7B architecture and trained on 6 million scientific publications alongside multimodal experimental data. It employs domain alignment, instruction fine-tuning, and cross-task prompt engineering to enable natural-language–driven, task-agnostic property prediction and inverse design without task-specific descriptors. Contribution/Results: MatLLM achieves knowledge transfer across 49,256 materials and 21 experimental datasets, outperforming state-of-the-art methods on all eight evaluated materials design tasks, with up to a 59.1% improvement in prediction accuracy. This demonstrates the feasibility and superiority of large language models as universal foundational models for intelligent materials discovery.

Technology Category

Application Category

📝 Abstract

Materials discovery and design aim to find compositions and structures with desirable properties over highly complex and diverse physical spaces. Traditional solutions, such as high-throughput simulations or machine learning, often rely on complex descriptors, which hinder generalizability and transferability across different material systems. Moreover, These descriptors may inadequately represent macro-scale material properties, which are influenced by structural imperfections and compositional variations in real-world samples, thus limiting their practical applicability. To address these challenges, we propose DARWIN 1.5, the largest open-source large language model tailored for materials science. By leveraging natural language as input, DARWIN eliminates the need for task-specific descriptors and enables a flexible, unified approach to material property prediction and discovery. Our approach integrates 6M material domain papers and 21 experimental datasets from 49,256 materials across modalities while enabling cross-task knowledge transfer. The enhanced model achieves up to 59.1% improvement in prediction accuracy over the base LLaMA-7B architecture and outperforms SOTA machine learning approaches across 8 materials design tasks. These results establish LLMs as a promising foundation for developing versatile and scalable models in materials science.

Problem

Research questions and friction points this paper is trying to address.

Materials Science

Predictive Modeling

Machine Learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

DARWIN 1.5

Materials Science

Language Model

🔎 Similar Papers

MatText: Do Language Models Need More than Text & Scale for Materials Modeling?

2024-06-25arXiv.orgCitations: 10