🤖 AI Summary
Knowledge graph question answering (KGQA) often struggles to effectively retrieve evidence due to the absence of intermediate supervision signals at the path or subgraph level. This work proposes PathISE, a framework that automatically generates high-quality, reusable path-level pseudo-supervision signals using only answer-level labels. By employing a lightweight Transformer-based evaluator to estimate informativeness and perform supervision distillation, PathISE guides large language models to produce compact and interpretable reasoning paths. Notably, the approach requires no human annotation or additional labeling from large models, enabling effective joint reasoning between large language models and knowledge graphs. Evaluated on three KGQA benchmarks, PathISE achieves state-of-the-art or competitive performance, and its generated supervision signals substantially enhance existing question-answering models.
📝 Abstract
Knowledge Graph Question Answering (KGQA) aims to answer user questions by reasoning over Knowledge Graphs (KGs). Recent KGQA methods mainly follow the retrieval-augmented generation paradigm to ground Large Language Models~(LLMs) with structured knowledge from KGs. However, training effective models to retrieve question-relevant evidence from KGs typically requires high-quality intermediate supervision signals, such as question-relevant paths or subgraphs, which are time- and resource-intensive to obtain. We propose PathISE, a novel framework for learning high-quality intermediate supervision from answer-level labels. PathISE introduces a lightweight transformer-based estimator that estimates the informativeness of relation paths to construct pseudo path-level supervision. This supervision is then distilled into an LLM path generator, whose generated paths are grounded in the KG to provide compact evidence for inductive answer reasoning. ExtensiveISE experiments on three KGQA benchmarks show that PathISE achieves competitive or state-of-the-art KGQA performance, and provides reusable supervision signals that can enhance existing KGQA models, without relying on costly LLM-refined supervision signals. Our source code is available at https://anonymous.4open.science/r/PathISE-2F87.