Exploring the Influence of Relevant Knowledge for Natural Language Generation Interpretability

📅 2025-10-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates how external knowledge integration affects the explainability of commonsense-oriented natural language generation (NLG). To address the limitation of existing evaluations—overreliance on superficial metrics—we propose a three-stage explainability assessment framework and introduce KITGI, a novel benchmark that integrates ConceptNet semantic relations with human annotations to enable controlled knowledge ablation studies. Experiments on T5-Large demonstrate that full knowledge input yields 91% correctness in generated outputs, whereas removing critical knowledge drops performance to just 6%, underscoring the decisive role of external knowledge in ensuring reasoning coherence and conceptual completeness. Our key contributions are: (1) the first explainability evaluation framework specifically designed for commonsense NLG; (2) KITGI, a knowledge-sensitive benchmark enabling fine-grained diagnostic analysis; and (3) empirical evidence establishing a causal link among knowledge grounding, reasoning fidelity, and generation quality—thereby shifting evaluation paradigms from surface-level consistency toward traceable, inference-aware assessment.

Technology Category

Application Category

📝 Abstract
This paper explores the influence of external knowledge integration in Natural Language Generation (NLG), focusing on a commonsense generation task. We extend the CommonGen dataset by creating KITGI, a benchmark that pairs input concept sets with retrieved semantic relations from ConceptNet and includes manually annotated outputs. Using the T5-Large model, we compare sentence generation under two conditions: with full external knowledge and with filtered knowledge where highly relevant relations were deliberately removed. Our interpretability benchmark follows a three-stage method: (1) identifying and removing key knowledge, (2) regenerating sentences, and (3) manually assessing outputs for commonsense plausibility and concept coverage. Results show that sentences generated with full knowledge achieved 91% correctness across both criteria, while filtering reduced performance drastically to 6%. These findings demonstrate that relevant external knowledge is critical for maintaining both coherence and concept coverage in NLG. This work highlights the importance of designing interpretable, knowledge-enhanced NLG systems and calls for evaluation frameworks that capture the underlying reasoning beyond surface-level metrics.
Problem

Research questions and friction points this paper is trying to address.

Investigating how external knowledge affects NLG interpretability in commonsense generation
Evaluating performance degradation when relevant semantic relations are removed
Establishing benchmark to assess knowledge importance for coherence and coverage
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extended CommonGen dataset with ConceptNet semantic relations
Compared T5 model generation with filtered versus full knowledge
Used three-stage method to assess commonsense plausibility
🔎 Similar Papers
No similar papers found.