🤖 AI Summary
Knowledge graph (KG) incompleteness and the structural complexity of first-order logic (FOL) queries—e.g., multiple operators, deep inference chains, and heterogeneous schemas—pose significant generalization bottlenecks for query answering. Method: We propose the first end-to-end framework that tightly couples query-aware subgraph retrieval with large language model (LLM)-based chain-of-thought reasoning. Without task-specific embedding training, our approach synergistically integrates neighborhood subgraph retrieval, FOL query decomposition, symbolic LLM reasoning, and context-enhanced prompting to jointly acquire structured evidence and perform logical inference. Contribution/Results: On standard benchmarks, our method achieves superior mean reciprocal rank (MRR) across all query complexities, significantly outperforming strong embedding-based baselines—especially on high-complexity queries. This validates the effectiveness and generalization advantage of the “retrieval + LLM reasoning” paradigm for complex FOL querying over KGs.
📝 Abstract
Reasoning over knowledge graphs (KGs) with first-order logic (FOL) queries is challenging due to the inherent incompleteness of real-world KGs and the compositional complexity of logical query structures. Most existing methods rely on embedding entities and relations into continuous geometric spaces and answer queries via differentiable set operations. While effective for simple query patterns, these approaches often struggle to generalize to complex queries involving multiple operators, deeper reasoning chains, or heterogeneous KG schemas. We propose ROG (Reasoning Over knowledge Graphs with large language models), an ensemble-style framework that combines query-aware KG neighborhood retrieval with large language model (LLM)-based chain-of-thought reasoning. ROG decomposes complex FOL queries into sequences of simpler sub-queries, retrieves compact, query-relevant subgraphs as contextual evidence, and performs step-by-step logical inference using an LLM, avoiding the need for task-specific embedding optimization. Experiments on standard KG reasoning benchmarks demonstrate that ROG consistently outperforms strong embedding-based baselines in terms of mean reciprocal rank (MRR), with particularly notable gains on high-complexity query types. These results suggest that integrating structured KG retrieval with LLM-driven logical reasoning offers a robust and effective alternative for complex KG reasoning tasks.