Leveraging Large Language Models for Generating Research Topic Ontologies: A Multi-Disciplinary Study

📅 2025-08-28

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

To address the high construction cost, uneven interdisciplinary coverage, and delayed updates of research-domain ontologies, this paper proposes the first large language model (LLM)-based framework for automated, multi-disciplinary ontology generation. We introduce PEM-Rel-8K—a high-quality, manually curated relation extraction dataset spanning biomedical, physics, and engineering domains—and systematically evaluate LLMs under zero-shot, chain-of-thought prompting, and fine-tuning paradigms for cross-domain semantic relation identification. Experimental results demonstrate that models fine-tuned on PEM-Rel-8K achieve state-of-the-art performance across all three disciplines, significantly outperforming existing baselines while exhibiting robust cross-domain transferability. This work establishes a scalable, low-cost paradigm for the automated construction and dynamic evolution of scientific knowledge graphs.

Technology Category

Application Category

📝 Abstract

Ontologies and taxonomies of research fields are critical for managing and organising scientific knowledge, as they facilitate efficient classification, dissemination and retrieval of information. However, the creation and maintenance of such ontologies are expensive and time-consuming tasks, usually requiring the coordinated effort of multiple domain experts. Consequently, ontologies in this space often exhibit uneven coverage across different disciplines, limited inter-domain connectivity, and infrequent updating cycles. In this study, we investigate the capability of several large language models to identify semantic relationships among research topics within three academic domains: biomedicine, physics, and engineering. The models were evaluated under three distinct conditions: zero-shot prompting, chain-of-thought prompting, and fine-tuning on existing ontologies. Additionally, we assessed the cross-domain transferability of fine-tuned models by measuring their performance when trained in one domain and subsequently applied to a different one. To support this analysis, we introduce PEM-Rel-8K, a novel dataset consisting of over 8,000 relationships extracted from the most widely adopted taxonomies in the three disciplines considered in this study: MeSH, PhySH, and IEEE. Our experiments demonstrate that fine-tuning LLMs on PEM-Rel-8K yields excellent performance across all disciplines.

Problem

Research questions and friction points this paper is trying to address.

Automating research ontology creation to reduce expert dependency

Addressing uneven coverage and infrequent updates in taxonomies

Evaluating LLMs' cross-domain semantic relationship identification capabilities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning LLMs on PEM-Rel-8K dataset

Evaluating zero-shot and chain-of-thought prompting

Assessing cross-domain transferability of models

🔎 Similar Papers

No similar papers found.