Learning Efficient and Generalizable Graph Retriever for Knowledge-Graph Question Answering

📅 2025-06-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the weak generalization and insufficient structural support of graph-based retrievers in Knowledge Graph Question Answering (KGQA), this paper proposes RAPL: (1) a causal-driven two-stage annotation mechanism that explicitly models query-to-subgraph causal relationships; (2) a model-agnostic, universal graph transformation method that jointly encodes intra- and inter-triple interactions; and (3) a path-guided retrieval-reasoning decoupling paradigm, where path encoding enhances structured output. Evaluated on multiple KGQA benchmarks, RAPL outperforms state-of-the-art methods by 2.66%–20.34%, demonstrating significantly improved robustness and generalization across varying model scales and diverse datasets. Moreover, it delivers more interpretable and architecture-compatible structured inputs to downstream reasoning modules.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have shown strong inductive reasoning ability across various domains, but their reliability is hindered by the outdated knowledge and hallucinations. Retrieval-Augmented Generation mitigates these issues by grounding LLMs with external knowledge; however, most existing RAG pipelines rely on unstructured text, limiting interpretability and structured reasoning. Knowledge graphs, which represent facts as relational triples, offer a more structured and compact alternative. Recent studies have explored integrating knowledge graphs with LLMs for knowledge graph question answering (KGQA), with a significant proportion adopting the retrieve-then-reasoning paradigm. In this framework, graph-based retrievers have demonstrated strong empirical performance, yet they still face challenges in generalization ability. In this work, we propose RAPL, a novel framework for efficient and effective graph retrieval in KGQA. RAPL addresses these limitations through three aspects: (1) a two-stage labeling strategy that combines heuristic signals with parametric models to provide causally grounded supervision; (2) a model-agnostic graph transformation approach to capture both intra- and inter-triple interactions, thereby enhancing representational capacity; and (3) a path-based reasoning strategy that facilitates learning from the injected rational knowledge, and supports downstream reasoner through structured inputs. Empirically, RAPL outperforms state-of-the-art methods by $2.66%-20.34%$, and significantly reduces the performance gap between smaller and more powerful LLM-based reasoners, as well as the gap under cross-dataset settings, highlighting its superior retrieval capability and generalizability. Codes are available at: https://github.com/tianyao-aka/RAPL.
Problem

Research questions and friction points this paper is trying to address.

Enhancing generalization in graph retrievers for KGQA
Improving structured reasoning with knowledge graphs
Reducing reliability issues in LLMs via retrieval
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage labeling combines heuristic and parametric models
Model-agnostic graph transformation captures triple interactions
Path-based reasoning enhances structured knowledge injection
🔎 Similar Papers
No similar papers found.
Tianjun Yao
Tianjun Yao
PhD student, Mohamed bin Zayed University of Artificial Intelligence
Machine Learning
H
Haoxuan Li
Mohamed bin Zayed University of Artificial Intelligence, Peking University
Z
Zhiqiang Shen
Mohamed bin Zayed University of Artificial Intelligence
P
Pan Li
Georgia Institute of Technology
Tongliang Liu
Tongliang Liu
Director, Sydney AI Centre, University of Sydney & Mohamed bin Zayed University of AI
Machine LearningLearning with Noisy LabelsTrustworthy Machine Learning
K
Kun Zhang
Mohamed bin Zayed University of Artificial Intelligence, Carnegie Mellon University