Structured Extraction of Process Structure Properties Relationships in Materials Science

📅 2025-04-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Efficient extraction of process–structure–property (PSP) relationships from unstructured materials science literature remains challenging due to the lack of standardized annotation schemes and domain-adapted extraction frameworks. Method: This work introduces the first general-purpose annotation schema for PSP ternary relations and proposes a cross-domain structured knowledge extraction framework covering high-temperature materials and microstructural simulation uncertainty quantification. The method integrates MatBERT-CRF sequence labeling with fine-tuned GPT-4o, leveraging cross-domain mixed training and zero-shot transfer strategies. Results: On Domain I, the fine-tuned GPT-4o achieves 89.2% F1 for entity extraction—significantly outperforming MatBERT-CRF. Incorporating Domain II data narrows the performance gap, demonstrating both the annotation schema’s strong generalizability and the complementary strengths of the two approaches. This work establishes a scalable technical paradigm for materials knowledge graph construction and trustworthy large-language-model reasoning.

Technology Category

Application Category

📝 Abstract
With the advent of large language models (LLMs), the vast unstructured text within millions of academic papers is increasingly accessible for materials discovery, although significant challenges remain. While LLMs offer promising few- and zero-shot learning capabilities, particularly valuable in the materials domain where expert annotations are scarce, general-purpose LLMs often fail to address key materials-specific queries without further adaptation. To bridge this gap, fine-tuning LLMs on human-labeled data is essential for effective structured knowledge extraction. In this study, we introduce a novel annotation schema designed to extract generic process-structure-properties relationships from scientific literature. We demonstrate the utility of this approach using a dataset of 128 abstracts, with annotations drawn from two distinct domains: high-temperature materials (Domain I) and uncertainty quantification in simulating materials microstructure (Domain II). Initially, we developed a conditional random field (CRF) model based on MatBERT, a domain-specific BERT variant, and evaluated its performance on Domain I. Subsequently, we compared this model with a fine-tuned LLM (GPT-4o from OpenAI) under identical conditions. Our results indicate that fine-tuning LLMs can significantly improve entity extraction performance over the BERT-CRF baseline on Domain I. However, when additional examples from Domain II were incorporated, the performance of the BERT-CRF model became comparable to that of the GPT-4o model. These findings underscore the potential of our schema for structured knowledge extraction and highlight the complementary strengths of both modeling approaches.
Problem

Research questions and friction points this paper is trying to address.

Extract process-structure-properties relationships from materials science texts
Adapt LLMs for materials-specific queries using fine-tuning and annotation
Compare BERT-CRF and fine-tuned LLM performance on structured knowledge extraction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning LLMs for materials-specific knowledge extraction
Novel annotation schema for process-structure-properties relationships
Combining BERT-CRF and GPT-4o for improved performance
🔎 Similar Papers
No similar papers found.
A
Amit K Verma
Computational Engineering Division, Lawrence Livermore National Laboratory, Livermore, CA 94550
Zhisong Zhang
Zhisong Zhang
City University of Hong Kong
Natural Language Processing
Junwon Seo
Junwon Seo
Carnegie Mellon University
RoboticsControlPerception
R
Robin Kuo
Materials Science and Engineering, Carnegie Mellon University, Pittsburgh, PA 15213
R
Runbo Jiang
Materials Science and Engineering, Carnegie Mellon University, Pittsburgh, PA 15213
Emma Strubell
Emma Strubell
Assistant Professor, Carnegie Mellon University
Natural Language ProcessingMachine LearningGreen AI
A
Anthony D Rollett
Materials Science and Engineering, Carnegie Mellon University, Pittsburgh, PA 15213