Characterising LLM-Generated Competency Questions: a Cross-Domain Empirical Study using Open and Closed Models

📅 2026-04-17
📈 Citations: 0
Influential: 0
📄 PDF

career value

184K/year
🤖 AI Summary
This study addresses the challenge of evaluating the quality of competency questions (CQs) automatically generated by large language models (LLMs) for ontology engineering. It proposes the first multidimensional quantitative framework assessing CQs along three key dimensions: readability, relevance, and structural complexity. Through systematic evaluation across multiple domain-specific use cases, the work compares CQs produced by both open-source (e.g., Llama3.1-8B) and closed-source models (e.g., GPT-4.1, Gemini 2.5 Pro), revealing significant influences of model type and application context on generation quality. Experimental results validate the effectiveness of the proposed metrics, offering empirical guidance and methodological support for selecting appropriate LLMs in automated ontology engineering tasks.

Technology Category

Application Category

📝 Abstract
Competency Questions (CQs) are a cornerstone of requirement elicitation in ontology engineering. CQs represent requirements as a set of natural language questions that an ontology should satisfy; they are traditionally modelled by ontology engineers together with domain experts as part of a human-centred, manual elicitation process. The use of Generative AI automates CQ creation at scale, therefore democratising the process of generation, widening stakeholder engagement, and ultimately broadening access to ontology engineering. However, given the large and heterogeneous landscape of LLMs, varying in dimensions such as parameter scale, task and domain specialisation, and accessibility, it is crucial to characterise and understand the intrinsic, observable properties of the CQs they produce (e.g., readability, structural complexity) through a systematic, cross-domain analysis. This paper introduces a set of quantitative measures for the systematic comparison of CQs across multiple dimensions. Using CQs generated from well defined use cases and scenarios, we identify their salient properties, including readability, relevance with respect to the input text and structural complexity of the generated questions. We conduct our experiments over a set of use cases and requirements using a range of LLMs, including both open (KimiK2-1T, LLama3.1-8B, LLama3.2-3B) and closed models (Gemini 2.5 Pro, GPT 4.1). Our analysis demonstrates that LLM performance reflects distinct generation profiles shaped by the use case.
Problem

Research questions and friction points this paper is trying to address.

Competency Questions
Large Language Models
Ontology Engineering
Cross-Domain Analysis
Generative AI
Innovation

Methods, ideas, or system contributions that make the work stand out.

Competency Questions
Large Language Models
Ontology Engineering
Cross-Domain Evaluation
Quantitative Metrics
🔎 Similar Papers