aLLoyM: A large language model for alloy phase diagram prediction

📅 2025-07-30

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

Alloy phase diagram prediction suffers from low efficiency and heavy reliance on empirical modeling. Method: This work introduces large language models (LLMs) to this task for the first time, proposing a specialized model for capturing composition–temperature–phase relationships. Built upon the Mistral architecture, it integrates data from the CPDDB database and CALPHAD-calculated results to construct a high-quality, question-answering–style training dataset. A dual-objective fine-tuning strategy is employed to jointly train multiple-choice discrimination and short-answer generation models. Contribution/Results: Experiments show significant improvement in multiple-choice accuracy; the short-answer model generates coherent, novel phase diagram descriptions, enabling exploratory prediction for unseen alloy systems. All models and datasets are publicly released on Hugging Face, establishing a new paradigm for intelligent materials design.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are general-purpose tools with wide-ranging applications, including in materials science. In this work, we introduce aLLoyM, a fine-tuned LLM specifically trained on alloy compositions, temperatures, and their corresponding phase information. To develop aLLoyM, we curated question-and-answer (Q&A) pairs for binary and ternary phase diagrams using the open-source Computational Phase Diagram Database (CPDDB) and assessments based on CALPHAD (CALculation of PHAse Diagrams). We fine-tuned Mistral, an open-source pre-trained LLM, for two distinct Q&A formats: multiple-choice and short-answer. Benchmark evaluations demonstrate that fine-tuning substantially enhances performance on multiple-choice phase diagram questions. Moreover, the short-answer model of aLLoyM exhibits the ability to generate novel phase diagrams from its components alone, underscoring its potential to accelerate the discovery of previously unexplored materials systems. To promote further research and adoption, we have publicly released the short-answer fine-tuned version of aLLoyM, along with the complete benchmarking Q&A dataset, on Hugging Face.

Problem

Research questions and friction points this paper is trying to address.

Predicting alloy phase diagrams using fine-tuned LLMs

Enhancing accuracy in multiple-choice phase diagram questions

Generating novel phase diagrams from alloy components

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned Mistral model for alloy phase prediction

Utilized Q&A pairs from CPDDB and CALPHAD

Publicly released model and dataset on Hugging Face

🔎 Similar Papers

Northeast Materials Database (NEMAD): Enabling Discovery of High Transition Temperature Magnetic Compounds