Bench4KE: Benchmarking Automated Competency Question Generation

📅 2025-05-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Automated knowledge engineering (KE) tools lack standardized evaluation, hindering methodological rigor and result reproducibility. Method: We introduce the first scalable API-based benchmark platform for LLM-driven ontology competency question (CQ) generation. It features the first gold-standard CQ dataset spanning four real-world ontology projects and a multidimensional evaluation framework integrating semantic similarity (BERTScore, ROUGE, BLEU), logical consistency, and diversity. The platform adopts a modular API architecture, enabling extensibility to downstream KE tasks such as SPARQL query generation. Contribution/Results: Using this platform, we conduct a comprehensive benchmark of four state-of-the-art LLM-CQ systems, establishing performance baselines. All code, datasets, and evaluation frameworks are publicly released under the Apache 2.0 license to advance standardization in automated KE evaluation.

Technology Category

Application Category

📝 Abstract
The availability of Large Language Models (LLMs) presents a unique opportunity to reinvigorate research on Knowledge Engineering (KE) automation, a trend already evident in recent efforts developing LLM-based methods and tools for the automatic generation of Competency Questions (CQs). However, the evaluation of these tools lacks standardisation. This undermines the methodological rigour and hinders the replication and comparison of results. To address this gap, we introduce Bench4KE, an extensible API-based benchmarking system for KE automation. Its first release focuses on evaluating tools that generate CQs automatically. CQs are natural language questions used by ontology engineers to define the functional requirements of an ontology. Bench4KE provides a curated gold standard consisting of CQ datasets from four real-world ontology projects. It uses a suite of similarity metrics to assess the quality of the CQs generated. We present a comparative analysis of four recent CQ generation systems, which are based on LLMs, establishing a baseline for future research. Bench4KE is also designed to accommodate additional KE automation tasks, such as SPARQL query generation, ontology testing and drafting. Code and datasets are publicly available under the Apache 2.0 license.
Problem

Research questions and friction points this paper is trying to address.

Lack of standardized evaluation for LLM-based CQ generation tools
Need for rigorous benchmarking in Knowledge Engineering automation
Absence of comparative baselines for Competency Question generation systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

API-based benchmarking system for KE automation
Uses similarity metrics to assess CQ quality
Accommodates additional KE automation tasks
🔎 Similar Papers
Anna Sofia Lippolis
Anna Sofia Lippolis
University of Bologna, ISTC-CNR
M
Minh Davide Ragagni
University of Bologna, 40126 Bologna, Italy
P
P. Ciancarini
University of Bologna, 40126 Bologna, Italy
Andrea Giovanni Nuzzolese
Andrea Giovanni Nuzzolese
Senior Researcher CNR-ISTC
Web ScienceSemantic WebLinked DataOntology DesignKnowledge Extraction
V
V. Presutti
University of Bologna, 40126 Bologna, Italy; CNR - Institute of Cognitive Sciences and Technologies, Rome & Bologna, Italy