Bench4KE: Benchmarking Automated Competency Question Generation

📅 2025-05-30

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Automated knowledge engineering (KE) tools lack standardized evaluation, hindering methodological rigor and result reproducibility. Method: We introduce the first scalable API-based benchmark platform for LLM-driven ontology competency question (CQ) generation. It features the first gold-standard CQ dataset spanning four real-world ontology projects and a multidimensional evaluation framework integrating semantic similarity (BERTScore, ROUGE, BLEU), logical consistency, and diversity. The platform adopts a modular API architecture, enabling extensibility to downstream KE tasks such as SPARQL query generation. Contribution/Results: Using this platform, we conduct a comprehensive benchmark of four state-of-the-art LLM-CQ systems, establishing performance baselines. All code, datasets, and evaluation frameworks are publicly released under the Apache 2.0 license to advance standardization in automated KE evaluation.

Technology Category

Application Category

📝 Abstract

The availability of Large Language Models (LLMs) presents a unique opportunity to reinvigorate research on Knowledge Engineering (KE) automation, a trend already evident in recent efforts developing LLM-based methods and tools for the automatic generation of Competency Questions (CQs). However, the evaluation of these tools lacks standardisation. This undermines the methodological rigour and hinders the replication and comparison of results. To address this gap, we introduce Bench4KE, an extensible API-based benchmarking system for KE automation. Its first release focuses on evaluating tools that generate CQs automatically. CQs are natural language questions used by ontology engineers to define the functional requirements of an ontology. Bench4KE provides a curated gold standard consisting of CQ datasets from four real-world ontology projects. It uses a suite of similarity metrics to assess the quality of the CQs generated. We present a comparative analysis of four recent CQ generation systems, which are based on LLMs, establishing a baseline for future research. Bench4KE is also designed to accommodate additional KE automation tasks, such as SPARQL query generation, ontology testing and drafting. Code and datasets are publicly available under the Apache 2.0 license.

Problem

Research questions and friction points this paper is trying to address.

Lack of standardized evaluation for LLM-based CQ generation tools

Need for rigorous benchmarking in Knowledge Engineering automation

Absence of comparative baselines for Competency Question generation systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

API-based benchmarking system for KE automation

Uses similarity metrics to assess CQ quality

Accommodates additional KE automation tasks

🔎 Similar Papers

A RAG Approach for Generating Competency Questions in Ontology Engineering