Symbol-based entity marker highlighting for enhanced text mining in materials science with generative AI

📅 2025-05-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the low accuracy of structured data extraction from materials science literature, this paper proposes a hybrid text-mining framework. First, symbolic entity markers are introduced to enhance named entity recognition (NER) performance; subsequently, a joint modeling approach integrates sequence labeling with structured generation to enable collaborative extraction of entities and relations. This method innovatively combines the strengths of multi-stage and end-to-end paradigms, overcoming traditional limitations in fine-grained entity identification and complex relational modeling. Evaluated on three authoritative benchmark datasets—MatScholar, SOFC, and one additional domain-specific corpus—the framework achieves a 58% improvement in entity-level F1 score and an 83% improvement in relation-level F1 score over state-of-the-art methods. The proposed approach establishes a new, efficient, and robust paradigm for constructing scientific literature knowledge graphs.

Technology Category

Application Category

📝 Abstract
The construction of experimental datasets is essential for expanding the scope of data-driven scientific discovery. Recent advances in natural language processing (NLP) have facilitated automatic extraction of structured data from unstructured scientific literature. While existing approaches-multi-step and direct methods-offer valuable capabilities, they also come with limitations when applied independently. Here, we propose a novel hybrid text-mining framework that integrates the advantages of both methods to convert unstructured scientific text into structured data. Our approach first transforms raw text into entity-recognized text, and subsequently into structured form. Furthermore, beyond the overall data structuring framework, we also enhance entity recognition performance by introducing an entity marker-a simple yet effective technique that uses symbolic annotations to highlight target entities. Specifically, our entity marker-based hybrid approach not only consistently outperforms previous entity recognition approaches across three benchmark datasets (MatScholar, SOFC, and SOFC slot NER) but also improve the quality of final structured data-yielding up to a 58% improvement in entity-level F1 score and up to 83% improvement in relation-level F1 score compared to direct approach.
Problem

Research questions and friction points this paper is trying to address.

Hybrid text-mining framework for structured data conversion
Enhanced entity recognition using symbolic annotations
Improving entity and relation extraction in materials science
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid text-mining framework combining multi-step and direct methods
Symbol-based entity marker for enhanced entity recognition
Improved structured data quality with significant F1 score gains
🔎 Similar Papers
No similar papers found.
Junhyeong Lee
Junhyeong Lee
Ph.D. Candidate, KAIST
Data-driven DesignArtificial IntelligenceComputational Mechanics
J
Jongmin Yuk
Department of Materials Science and Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
C
Chan-Woo Lee
Energy Storage Research Department, Korea Institute of Energy Research, Daejeon 34129, Republic of Korea; Energy AI & Computational Science Laboratory, Korea Institute of Energy Research, Daejeon 34129, Republic of Korea