Better Later Than Sooner: Neuro-Symbolic Knowledge Graph Construction via Ontology-grounded Post-extraction Correction

📅 2026-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge that existing large language models often generate facts violating ontological constraints during knowledge graph extraction, thereby hindering symbolic reasoning and complex querying. The authors propose a neuro-symbolic framework that innovatively defers ontology-aware consistency correction to a post-processing stage following open-domain extraction. By integrating embedding-driven entity type and predicate normalization, ontology-guided large language model refinement, and SPARQL-based graph pattern analysis, the approach significantly reduces the computational cost of large model invocations while effectively enhancing the logical consistency of the resulting knowledge graph. Experimental results demonstrate that the method substantially decreases ontology violations without compromising question-answering performance and reliably supports graph pattern operations required for symbolic queries.
📝 Abstract
Question answering (QA) is a core challenge in AI, particularly for complex queries requiring multi-hop reasoning across documents, or symbolic operations like aggregation or exhaustive listing. Retrieval-augmented generation has become the dominant approach to QA, with recent graph-based variants addressing part of these issues by organizing knowledge to better support compositional questions. However, most textual graph-based RAG methods still lack the structure needed for symbolic operations useful to answer complex questions reliably. This motivates symbolic graph-based approaches, which extract knowledge graphs (KGs) whose relations are logic predicates that enable SQL-like querying. Yet these pipelines typically use LLMs for KG extraction, which can introduce consistency issues, where extracted facts may violate commonsense ontology constraints. We propose a neuro-symbolic framework for ontology-grounded KG construction combining open-domain extraction, embedding-based canonicalization of types and predicates, and targeted LLM-based correction of ontology violations. By deferring corrections to a post-extraction stage, our method avoids repeated LLM calls, substantially reducing token usage while improving KG consistency and preserving downstream QA quality. Finally, we show that the extracted KGs are well suited for symbolic querying by measuring the occurrence of SPARQL graph patterns.
Problem

Research questions and friction points this paper is trying to address.

knowledge graph construction
ontology consistency
symbolic reasoning
complex question answering
LLM extraction errors
Innovation

Methods, ideas, or system contributions that make the work stand out.

neuro-symbolic
ontology-grounded correction
post-extraction refinement
knowledge graph construction
retrieval-augmented QA