GraphWalk: Enabling Reasoning in Large Language Models through Tool-Based Graph Navigation

📅 2026-04-02

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

Current large language models are constrained by context length, making it challenging to perform multi-hop reasoning and complex queries over enterprise-scale knowledge graphs. This work proposes a training-free, task-agnostic tool-augmented framework that introduces, for the first time, a minimal orthogonal set of graph operations, enabling general-purpose large language models to traverse graph structures in a sequential, transparent, and verifiable manner to accomplish multi-step reasoning. By integrating tool-call-driven graph traversal, decomposition of multi-hop queries, and retrieval-augmented generation, the approach substantially outperforms in-context reasoning on both synthetic and enterprise-like knowledge graphs. The performance gain becomes more pronounced with larger model scales, effectively mitigating the performance collapse commonly observed in traditional methods when handling complex queries.

Technology Category

Application Category

📝 Abstract

The use of knowledge graphs for grounding agents in real-world Q&A applications has become increasingly common. Answering complex queries often requires multi-hop reasoning and the ability to navigate vast relational structures. Standard approaches rely on prompting techniques that steer large language models to reason over raw graph context, or retrieval-augmented generation pipelines where relevant subgraphs are injected into the context. These, however, face severe limitations with enterprise-scale KGs that cannot fit in even the largest context windows available today. We present GraphWalk, a problem-agnostic, training-free, tool-based framework that allows off-the-shelf LLMs to reason through sequential graph navigation, dramatically increasing performance across different tasks. Unlike task-specific agent frameworks that encode domain knowledge into specialized tools, GraphWalk equips the LLM with a minimal set of orthogonal graph operations sufficient to traverse any graph structure. We evaluate whether models equipped with GraphWalk can compose these operations into correct multi-step reasoning chains, where each tool call represents a verifiable step creating a transparent execution trace. We first demonstrate our approach on maze traversal, a problem non-reasoning models are completely unable to solve, then present results on graphs resembling real-world enterprise knowledge graphs. To isolate structural reasoning from world knowledge, we evaluate on entirely synthetic graphs with random, non-semantic labels. Our benchmark spans 12 query templates from basic retrieval to compound first-order logic queries. Results show that tool-based traversal yields substantial and consistent gains over in-context baselines across all model families tested, with gains becoming more pronounced as scale increases, precisely where in-context approaches fail catastrophically.

Problem

Research questions and friction points this paper is trying to address.

knowledge graphs

multi-hop reasoning

large language models

graph navigation

context window limitation

Innovation

Methods, ideas, or system contributions that make the work stand out.

tool-based reasoning

graph navigation

multi-hop reasoning