GraphDancer: Training LLMs to Explore and Reason over Graphs via Curriculum Reinforcement Learning

📅 2026-01-24

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

Large language models struggle to effectively leverage heterogeneous graph-structured external knowledge for multi-hop reasoning and evidence aggregation. This work proposes a two-stage post-training framework that alternates between natural language reasoning and graph function calls to guide the model in exploring and reasoning over graphs. A graph-aware curriculum learning strategy, based on the complexity of information retrieval trajectories, is introduced to progressively enhance the model’s interactive capability and reasoning efficiency. Combined with rule-based reward guidance and cross-domain generalization training, the approach enables a 3B-parameter model to outperform larger-scale baselines on both unseen domains and out-of-distribution questions, significantly advancing cross-domain graph reasoning performance.

📝 Abstract

Large language models (LLMs) increasingly rely on external knowledge to improve factuality, yet many real-world knowledge sources are organized as heterogeneous graphs rather than plain text. Reasoning over such graph-structured knowledge poses two key challenges: (1) navigating structured, schema-defined relations requires precise function calls rather than similarity-based retrieval, and (2) answering complex questions often demands multi-hop evidence aggregation through iterative information seeking. We propose GraphDancer, a reinforcement learning (RL) framework that teaches LLMs to navigate graphs by interleaving reasoning and function execution. To make RL effective for moderate-sized LLMs, we introduce a graph-aware curriculum that schedules training by the structural complexity of information-seeking trajectories using an easy-to-hard biased sampler. We evaluate GraphDancer on a multi-domain benchmark by training on one domain only and testing on unseen domains and out-of-distribution question types. Despite using only a 3B backbone, GraphDancer outperforms baselines equipped with either a 14B backbone or GPT-4o-mini, demonstrating robust cross-domain generalization of graph exploration and reasoning skills. Our code and models can be found at https://yuyangbai.com/graphdancer/ .

Problem

Research questions and friction points this paper is trying to address.

graph reasoning

large language models

heterogeneous graphs

external knowledge

multi-hop reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

graph reasoning

curriculum post-training

two-stage training