NLP-AKG: Few-Shot Construction of NLP Academic Knowledge Graph Based on LLM

📅 2025-02-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing academic knowledge graphs typically model paper entities or domain concepts in isolation, neglecting deep semantic associations among papers driven by shared concepts—leading to incomplete knowledge coverage and weak concept-paper alignment in scientific question answering. To address this, we propose a novel deep knowledge graph construction method tailored for NLP-oriented scientific QA. Our approach is the first to explicitly model cross-paper semantic associations grounded in shared domain concepts. We design a few-shot, large language model–driven framework for knowledge extraction and multi-granularity semantic parsing, and introduce citation-aware relational enhancement alongside subgraph community summarization. Evaluated on the ACL Anthology (60K+ papers), our graph comprises 620K entities and 2.27M relations. Experiments on three scientific QA benchmarks demonstrate significant improvements in both answer accuracy and interpretability.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have been widely applied in question answering over scientific research papers. To enhance the professionalism and accuracy of responses, many studies employ external knowledge augmentation. However, existing structures of external knowledge in scientific literature often focus solely on either paper entities or domain concepts, neglecting the intrinsic connections between papers through shared domain concepts. This results in less comprehensive and specific answers when addressing questions that combine papers and concepts. To address this, we propose a novel knowledge graph framework that captures deep conceptual relations between academic papers, constructing a relational network via intra-paper semantic elements and inter-paper citation relations. Using a few-shot knowledge graph construction method based on LLM, we develop NLP-AKG, an academic knowledge graph for the NLP domain, by extracting 620,353 entities and 2,271,584 relations from 60,826 papers in ACL Anthology. Based on this, we propose a 'sub-graph community summary' method and validate its effectiveness on three NLP scientific literature question answering datasets.
Problem

Research questions and friction points this paper is trying to address.

Enhance accuracy in scientific responses
Capture deep conceptual relations
Construct NLP academic knowledge graph
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based few-shot knowledge graph
Deep conceptual relations capture
Sub-graph community summary method
🔎 Similar Papers
No similar papers found.
J
Jiayin Lan
Harbin Institute of Technology, Harbin, China
J
Jiaqi Li
Joint Laboratory of HIT and iFLYTEK, Beijing, China; University of Science and Technology of China, HeFei, China
Baoxin Wang
Baoxin Wang
iFLYTEK Research
Large Language ModelsGrammatical Error CorrectionNatural Language Processing
M
Ming Liu
Harbin Institute of Technology, Harbin, China
D
Dayong Wu
Joint Laboratory of HIT and iFLYTEK, Beijing, China
Shijin Wang
Shijin Wang
Tongji University
Schedulingmaintenance
Bing Qin
Bing Qin
Professor in Harbin Institute of Technology
Natural Language ProcessingInformation ExtractionSentiment Analysis