A RAG Approach for Generating Competency Questions in Ontology Engineering

📅 2024-09-13
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the bottleneck in ontology engineering where competency questions (CQs) are manually constructed—time-consuming, labor-intensive, and heavily reliant on domain experts—this paper proposes a retrieval-augmented generation (RAG)-based method for automated CQ generation. Leveraging raw scientific papers as the knowledge source, the approach requires no pre-built ontology or knowledge graph; instead, it dynamically retrieves semantically relevant supporting literature and prompts GPT-4—under zero-shot and context-enhanced settings—to generate CQs. This work is the first to introduce the RAG paradigm to CQ generation and systematically investigates the impact of corpus scale and LLM temperature. Evaluated under a dual-dimensional framework (precision and consistency), our method significantly outperforms baseline zero-shot prompting across two ontology engineering tasks, achieving substantial improvements in CQ accuracy and expert alignment—demonstrating the effectiveness and practicality of literature-driven RAG for automated CQ generation.

Technology Category

Application Category

📝 Abstract
Competency question (CQ) formulation is central to several ontology development and evaluation methodologies. Traditionally, the task of crafting these competency questions heavily relies on the effort of domain experts and knowledge engineers which is often time-consuming and labor-intensive. With the emergence of Large Language Models (LLMs), there arises the possibility to automate and enhance this process. Unlike other similar works which use existing ontologies or knowledge graphs as input to LLMs, we present a retrieval-augmented generation (RAG) approach that uses LLMs for the automatic generation of CQs given a set of scientific papers considered to be a domain knowledge base. We investigate its performance and specifically, we study the impact of different number of papers to the RAG and different temperature setting of the LLM. We conduct experiments using GPT-4 on two domain ontology engineering tasks and compare results against ground-truth CQs constructed by domain experts. Empirical assessments on the results, utilizing evaluation metrics (precision and consistency), reveal that compared to zero-shot prompting, adding relevant domain knowledge to the RAG improves the performance of LLMs on generating CQs for concrete ontology engineering tasks.
Problem

Research questions and friction points this paper is trying to address.

Automate competency question generation using RAG and LLMs.
Enhance CQ formulation efficiency in ontology engineering.
Evaluate RAG performance with varying domain knowledge volumes.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-augmented generation for CQs
LLMs with scientific papers input
Domain knowledge enhances LLM performance
🔎 Similar Papers
No similar papers found.
X
Xueli Pan
Vrije Universiteit Amsterdam, De Boelelaan 1105, 1081 HV Amsterdam, Netherlands
J
J. V. Ossenbruggen
Vrije Universiteit Amsterdam, De Boelelaan 1105, 1081 HV Amsterdam, Netherlands
Victor de Boer
Victor de Boer
Associate Professor, VU University Amsterdam
Semantic WebICT4DDigital HumanitiesArtificial IntelligenceHuman-Computer Interaction
Zhisheng Huang
Zhisheng Huang
Senior Researcher of Computer Science, Vrije University Amsterdam
Artificial IntelligenceSemantic WebLogics