🤖 AI Summary
Current large language models are constrained by context length, making it challenging to perform multi-hop reasoning and complex queries over enterprise-scale knowledge graphs. This work proposes a training-free, task-agnostic tool-augmented framework that introduces, for the first time, a minimal orthogonal set of graph operations, enabling general-purpose large language models to traverse graph structures in a sequential, transparent, and verifiable manner to accomplish multi-step reasoning. By integrating tool-call-driven graph traversal, decomposition of multi-hop queries, and retrieval-augmented generation, the approach substantially outperforms in-context reasoning on both synthetic and enterprise-like knowledge graphs. The performance gain becomes more pronounced with larger model scales, effectively mitigating the performance collapse commonly observed in traditional methods when handling complex queries.
📝 Abstract
The use of knowledge graphs for grounding agents in real-world Q&A applications has become increasingly common. Answering complex queries often requires multi-hop reasoning and the ability to navigate vast relational structures. Standard approaches rely on prompting techniques that steer large language models to reason over raw graph context, or retrieval-augmented generation pipelines where relevant subgraphs are injected into the context. These, however, face severe limitations with enterprise-scale KGs that cannot fit in even the largest context windows available today. We present GraphWalk, a problem-agnostic, training-free, tool-based framework that allows off-the-shelf LLMs to reason through sequential graph navigation, dramatically increasing performance across different tasks. Unlike task-specific agent frameworks that encode domain knowledge into specialized tools, GraphWalk equips the LLM with a minimal set of orthogonal graph operations sufficient to traverse any graph structure. We evaluate whether models equipped with GraphWalk can compose these operations into correct multi-step reasoning chains, where each tool call represents a verifiable step creating a transparent execution trace. We first demonstrate our approach on maze traversal, a problem non-reasoning models are completely unable to solve, then present results on graphs resembling real-world enterprise knowledge graphs. To isolate structural reasoning from world knowledge, we evaluate on entirely synthetic graphs with random, non-semantic labels. Our benchmark spans 12 query templates from basic retrieval to compound first-order logic queries. Results show that tool-based traversal yields substantial and consistent gains over in-context baselines across all model families tested, with gains becoming more pronounced as scale increases, precisely where in-context approaches fail catastrophically.