🤖 AI Summary
This work proposes OntoEKG, a novel end-to-end pipeline that leverages large language models (LLMs) for enterprise knowledge graph ontology construction—a task traditionally reliant on costly manual effort and domain expertise. The approach first automatically extracts core classes and attributes from unstructured data, then employs logical reasoning to generate a hierarchical ontology, ultimately producing standard RDF output. The study introduces a two-stage modeling framework, establishes a benchmark spanning the data, finance, and logistics domains, and demonstrates the method’s effectiveness with a fuzzy-matching F1 score of 0.724 in the data domain. While validating the feasibility of LLM-driven ontology engineering, the work also highlights persistent challenges in scope definition and hierarchical reasoning.
📝 Abstract
Enterprise Knowledge Graphs have become essential for unifying heterogeneous data and enforcing semantic governance. However, the construction of their underlying ontologies remains a resource-intensive, manual process that relies heavily on domain expertise. This paper introduces OntoEKG, a LLM-driven pipeline designed to accelerate the generation of domain-specific ontologies from unstructured enterprise data. Our approach decomposes the modelling task into two distinct phases: an extraction module that identifies core classes and properties, and an entailment module that logically structures these elements into a hierarchy before serialising them into standard RDF. Addressing the significant lack of comprehensive benchmarks for end-to-end ontology construction, we adopt a new evaluation dataset derived from documents across the Data, Finance, and Logistics sectors. Experimental results highlight both the potential and the challenges of this approach, achieving a fuzzy-match F1-score of 0.724 in the Data domain while revealing limitations in scope definition and hierarchical reasoning.