🤖 AI Summary
To address the challenges of ontology scarcity, high manual effort, and poor interoperability in transforming relational databases into high-semantic-quality knowledge graphs, this paper proposes RIGOR: a retrieval-augmented, iterative ontology generation framework. RIGOR integrates database schemas, domain ontology repositories, and a dynamically evolving core ontology, leveraging foreign-key-driven iterations and a dual-LLM coordination mechanism—where a generative LLM constructs OWL ontologies and a discriminative LLM ensures semantic consistency and formal verifiability. This enables traceable, incremental, and fully automated ontology engineering. Experiments on real-world databases demonstrate that RIGOR-generated ontologies significantly outperform baseline approaches in accuracy, completeness, and logical consistency, while reducing human intervention by over 70%. The resulting ontologies effectively support semantic interoperability and graph neural network–based reasoning.
📝 Abstract
Transforming relational databases into knowledge graphs with enriched ontologies enhances semantic interoperability and unlocks advanced graph-based learning and reasoning over data. However, previous approaches either demand significant manual effort to derive an ontology from a database schema or produce only a basic ontology. We present RIGOR, Retrieval-augmented Iterative Generation of RDB Ontologies, an LLM-driven approach that turns relational schemas into rich OWL ontologies with minimal human effort. RIGOR combines three sources via RAG, the database schema and its documentation, a repository of domain ontologies, and a growing core ontology, to prompt a generative LLM for producing successive, provenance-tagged delta ontology fragments. Each fragment is refined by a judge-LLM before being merged into the core ontology, and the process iterates table-by-table following foreign key constraints until coverage is complete. Applied to real-world databases, our approach outputs ontologies that score highly on standard quality dimensions such as accuracy, completeness, conciseness, adaptability, clarity, and consistency, while substantially reducing manual effort.