🤖 AI Summary
This work addresses the limited exploration capability of agents in knowledge graph question answering, which stems from scarce training data and insufficient reasoning generalization. To overcome this, the authors propose GraphWalker, a novel framework that integrates structurally diverse automatically synthesized trajectories with a two-stage supervised fine-tuning strategy. It first pretrains the agent using diverse exploration trajectories generated via constrained random walks, then refines performance through fine-tuning on a small set of expert trajectories, augmented by lightweight reinforcement learning to enhance reflection and error correction. This approach eliminates reliance on predefined paths and achieves state-of-the-art performance on CWQ and WebQSP, while significantly improving out-of-distribution reasoning path generalization on GrailQA and the newly introduced GraphWalkerBench benchmark.
📝 Abstract
Agentic knowledge graph question answering (KGQA) requires an agent to iteratively interact with knowledge graphs (KGs), posing challenges in both training data scarcity and reasoning generalization. Specifically, existing approaches often restrict agent exploration: prompting-based methods lack autonomous navigation training, while current training pipelines usually confine reasoning to predefined trajectories. To this end, this paper proposes \textit{GraphWalker}, a novel agentic KGQA framework that addresses these challenges through \textit{Automated Trajectory Synthesis} and \textit{Stage-wise Fine-tuning}. GraphWalker adopts a two-stage SFT training paradigm: First, the agent is trained on structurally diverse trajectories synthesized from constrained random-walk paths, establishing a broad exploration prior over the KG; Second, the agent is further fine-tuned on a small set of expert trajectories to develop reflection and error recovery capabilities. Extensive experiments demonstrate that our stage-wise SFT paradigm unlocks a higher performance ceiling for a lightweight reinforcement learning (RL) stage, enabling GraphWalker to achieve state-of-the-art performance on CWQ and WebQSP. Additional results on GrailQA and our constructed GraphWalkerBench confirm that GraphWalker enhances generalization to out-of-distribution reasoning paths. The code is publicly available at https://github.com/XuShuwenn/GraphWalker