Construction and Application of Materials Knowledge Graph in Multidisciplinary Materials Science via Large Language Model

📅 2024-04-03
🏛️ Neural Information Processing Systems
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Materials science knowledge is highly fragmented, and traditional experimental discovery is costly and time-consuming, impeding efficient novel material development; existing AI approaches suffer from poor data quality, heavy reliance on manual annotation, and insufficient semantic traceability. Method: This paper proposes an end-to-end knowledge graph (KG) construction paradigm tailored for multidisciplinary materials science, introducing a novel collaborative framework integrating large language models (LLMs) with domain-specific ontologies. It unifies named entity recognition, relation extraction, KG embedding, and link prediction to automatically extract structured triples from high-quality literature published over the past decade. Contribution/Results: The resulting Materials Knowledge Graph (MKG) comprises 162,605 nodes and 731,772 edges, enabling cross-disciplinary semantic alignment and traceable knowledge organization. Empirical evaluation demonstrates significant improvements in knowledge retrieval, logical reasoning, and property prediction—substantially reducing dependence on empirical trial-and-error.

Technology Category

Application Category

📝 Abstract
Knowledge in materials science is widely dispersed across extensive scientific literature, posing significant challenges to the efficient discovery and integration of new materials. Traditional methods, often reliant on costly and time-consuming experimental approaches, further complicate rapid innovation. Addressing these challenges, the integration of artificial intelligence with materials science has opened avenues for accelerating the discovery process, though it also demands precise annotation, data extraction, and traceability of information. To tackle these issues, this article introduces the Materials Knowledge Graph (MKG), which utilizes advanced natural language processing techniques integrated with large language models to extract and systematically organize a decade's worth of high-quality research into structured triples, contains 162,605 nodes and 731,772 edges. MKG categorizes information into comprehensive labels such as Name, Formula, and Application, structured around a meticulously designed ontology, thus enhancing data usability and integration. By implementing network-based algorithms, MKG not only facilitates efficient link prediction but also significantly reduces reliance on traditional experimental methods. This structured approach not only streamlines materials research but also lays the groundwork for more sophisticated science knowledge graphs.
Problem

Research questions and friction points this paper is trying to address.

Efficient discovery of dispersed materials science knowledge
Reducing reliance on costly experimental methods
Structuring unstructured data via knowledge graphs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes large language models for data extraction
Constructs Materials Knowledge Graph with structured triples
Implements network algorithms for link prediction
Y
Yanpeng Ye
School of Computer Science and Engineering, University of New South Wales, Kensington, NSW, Australia
J
Jie Ren
Department of Materials Science and Engineering, City University of Hong Kong, Hong Kong, China
S
Shaozhou Wang
GreenDynamics Pty. Ltd, Kensington, NSW, Australia
Y
Yuwei Wan
Department of Linguistics and Translation, City University of Hong Kong, Hong Kong, China
Imran Razzak
Imran Razzak
MBZUAI, Abu Dhabi
Human-Centered AIMedical Image AnalysisMedical Artificial IntelligenceComputational Biology
B
B. Hoex
Haofen Wang
Haofen Wang
Tongji University
Knowledge GraphNatural Language ProcessingRetrieval Augmented Generation
Tong Xie
Tong Xie
Green Dynamics & University of New South Wales
Solar CellsLarge Language ModelsCheminformaticsNano Materials
W
Wenjie Zhang
School of Computer Science and Engineering, University of New South Wales, Kensington, NSW, Australia