SCPRM: A Schema-aware Cumulative Process Reward Model for Knowledge Graph Question Answering

📅 2026-05-04

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This work addresses the susceptibility of existing process reward models to risk compensation effects in multi-hop reasoning over knowledge graphs, where erroneous intermediate steps may be masked by subsequent correct ones, leading to undeservedly high rewards for flawed reasoning paths—a critical issue in high-stakes domains such as healthcare and law. To mitigate this, the authors propose a pattern-aware cumulative process reward model that quantifies the pattern distance between reasoning prefixes and the implicit goal of the query, thereby delivering both cumulative and forward-looking rewards. Integrated into a Monte Carlo Tree Search (MCTS) framework, this approach enhances the accuracy and risk sensitivity of path evaluation. Empirical results on medical and legal knowledge graphs as well as the ComplexWebQuestions (CWQ) dataset demonstrate an average 1.18% improvement in Hits@k over strong baselines.

📝 Abstract

Large language models excel at complex reasoning, yet evaluating their intermediate steps remains challenging. Although process reward models provide step-wise supervision, they often suffer from a risk compensation effect, where incorrect steps are offset by later correct ones, assigning high rewards to flawed reasoning paths. This issue is further exacerbated in knowledge graph (KG) reasoning, as there may exist multiple paths between the start and end entities in the KGs, and a risky step can make the reasoning path flawed. Those limitations are problematic in risk-sensitive tasks such as medical and legal KG reasoning. To address the issues, we propose a Schema-aware Cumulative Process Reward Model (SCPRM) that evaluates reasoning paths by conditioning on the reasoning prefix , and incorporating schema distance between current reasoning step and the implicit target parsed from the query, which provides cumulative and future rewards to guide the path explorations. We further integrate SCPRM into Monte Carlo Tree Search (MCTS) as SCPRM-MCTS to conduct multi-hop reasoning on KGs for question answering (QA) tasks. Across medical and legal KGQA and CWQ, SCPRM-MCTS improves the performance of Hits@k by an average of 1.18% over strong baselines, demonstrating more accurate and risk-sensitive reasoning evaluation.

Problem

Research questions and friction points this paper is trying to address.

process reward model

risk compensation effect

knowledge graph question answering

reasoning path evaluation

risk-sensitive reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Schema-aware

Cumulative Process Reward

Risk-sensitive Reasoning