Rule2Text: A Framework for Generating and Evaluating Natural Language Explanations of Knowledge Graph Rules

📅 2025-08-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Logical rules mined from knowledge graphs often suffer from poor interpretability due to their rigid formalization and strong domain dependence. To address this, we propose Rule2Text—a framework leveraging large language models (LLMs) to automatically translate formal logical rules into natural language explanations. Our method integrates zero-shot and few-shot prompting, variable-type injection, and chain-of-thought reasoning; initial explanations are generated using Gemini 2.0 Flash, followed by fine-tuning of the Zephyr model to enhance fidelity and fluency. Crucially, we introduce LLM-as-a-judge—an automated evaluation mechanism—and construct a high-quality annotated dataset supporting type-agnostic knowledge graphs. Experiments demonstrate that the fine-tuned model significantly improves explanation quality, especially in complex domains such as biomedicine. Moreover, automatic evaluation scores correlate strongly with human judgments (Spearman’s ρ > 0.9), validating the framework’s effectiveness and generalizability.

Technology Category

Application Category

📝 Abstract
Knowledge graphs (KGs) can be enhanced through rule mining; however, the resulting logical rules are often difficult for humans to interpret due to their inherent complexity and the idiosyncratic labeling conventions of individual KGs. This work presents Rule2Text, a comprehensive framework that leverages large language models (LLMs) to generate natural language explanations for mined logical rules, thereby improving KG accessibility and usability. We conduct extensive experiments using multiple datasets, including Freebase variants (FB-CVT-REV, FB+CVT-REV, and FB15k-237) as well as the ogbl-biokg dataset, with rules mined using AMIE 3.5.1. We systematically evaluate several LLMs across a comprehensive range of prompting strategies, including zero-shot, few-shot, variable type incorporation, and Chain-of-Thought reasoning. To systematically assess models' performance, we conduct a human evaluation of generated explanations on correctness and clarity. To address evaluation scalability, we develop and validate an LLM-as-a-judge framework that demonstrates strong agreement with human evaluators. Leveraging the best-performing model (Gemini 2.0 Flash), LLM judge, and human-in-the-loop feedback, we construct high-quality ground truth datasets, which we use to fine-tune the open-source Zephyr model. Our results demonstrate significant improvements in explanation quality after fine-tuning, with particularly strong gains in the domain-specific dataset. Additionally, we integrate a type inference module to support KGs lacking explicit type information. All code and data are publicly available at https://github.com/idirlab/KGRule2NL.
Problem

Research questions and friction points this paper is trying to address.

Generating natural language explanations for complex knowledge graph rules
Improving human interpretability of mined logical rules in KGs
Evaluating and scaling explanation quality using LLMs and human feedback
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-generated natural language explanations for rules
Fine-tuning open-source models with human feedback
Type inference module for KGs lacking types
🔎 Similar Papers
No similar papers found.