Self-Exploring Language Models for Explainable Link Forecasting on Temporal Graphs via Reinforcement Learning

📅 2025-08-31

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Existing temporal graph neural networks lack interpretability and generalizability to unseen graph structures; meanwhile, large language models (LLMs) in temporal graph reasoning remain confined to static or synthetic small-scale graphs, with no systematic evaluation of reasoning trace quality. Method: We propose ReaL-TG—a novel framework that introduces reinforcement learning to temporal graph link prediction for the first time—enabling an LLM (Qwen3-4B) to autonomously discover interpretable, history-aware reasoning paths. It integrates structural state modeling with outcome-oriented reward design and employs a dual-track evaluation protocol combining ranking metrics and LLM-as-a-Judge assessment. Contribution/Results: ReaL-TG-4B achieves state-of-the-art predictive performance on real-world temporal graphs, outperforming larger models such as GPT-5 mini. Its generated explanations demonstrate high fidelity and low hallucination rates, validated by both automated and human evaluations.

Technology Category

Application Category

📝 Abstract

Forecasting future links is a central task in temporal graph (TG) reasoning, requiring models to leverage historical interactions to predict upcoming ones. Traditional neural approaches, such as temporal graph neural networks, achieve strong performance but lack explainability and cannot be applied to unseen graphs without retraining. Recent studies have begun to explore using large language models (LLMs) for graph reasoning, but most of them are constrained to static graphs or small synthetic TGs and lack the evaluation of the quality of reasoning traces generated by LLMs. In this work, we present Reasoning-Enhanced Learning for Temporal Graphs (ReaL-TG), a reinforcement learning framework that fine-tunes LLMs to perform explainable link forecasting on real-world TGs. ReaL-TG uses outcome-based reward to encourage models to self-explore reasoning strategies from graph structure and to produce explanations that directly justify their predictions. To enable evaluation on LLM-generated reasoning traces, we propose a new evaluation protocol combining ranking metrics with an LLM-as-a-Judge system that assesses both the quality of reasoning and the impact of hallucinations. Experiments with ReaL-TG-4B, obtained by fine-tuning Qwen3-4B under our framework, show that it outperforms much larger frontier LLMs, including GPT-5 mini, on ranking metrics, while producing high-quality explanations confirmed by both the LLM judge and human evaluation.

Problem

Research questions and friction points this paper is trying to address.

Enabling explainable link forecasting on temporal graphs using reinforcement learning

Addressing lack of explainability in traditional temporal graph neural networks

Evaluating quality of reasoning traces and hallucinations in LLM predictions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning fine-tunes LLMs

Self-exploring reasoning strategies from graphs

LLM-as-a-Judge evaluates reasoning quality

🔎 Similar Papers

No similar papers found.