π€ AI Summary
To address the low faithfulness of natural language explanations (NLEs) and their failure to reflect modelsβ actual reasoning paths, this paper proposes G-Tex, a graph-guided text explanation generation framework. Methodologically, G-Tex innovatively models high-faithfulness highlight explanations as graph structures, explicitly encoding reasoning cues via graph neural networks (GNNs) and integrating them into the T5/BART generation process to ensure traceable alignment between NLEs and internal model reasoning. Its two-stage paradigm jointly optimizes highlight extraction, GNN-based representation learning, sequence generation, and semantic similarity. Evaluated on three reasoning benchmarks, G-Tex achieves up to a 12.18% improvement in NLE faithfulness. Generated explanations exhibit enhanced semantic and lexical fidelity to human annotations; human evaluation confirms significantly reduced redundancy and markedly improved overall quality, balancing interpretability and verifiability.
π Abstract
Natural language explanations (NLEs) are commonly used to provide plausible free-text explanations of a model's reasoning about its predictions. However, recent work has questioned their faithfulness, as they may not accurately reflect the model's internal reasoning process regarding its predicted answer. In contrast, highlight explanations--input fragments critical for the model's predicted answers--exhibit measurable faithfulness. Building on this foundation, we propose G-Tex, a Graph-Guided Textual Explanation Generation framework designed to enhance the faithfulness of NLEs. Specifically, highlight explanations are first extracted as faithful cues reflecting the model's reasoning logic toward answer prediction. They are subsequently encoded through a graph neural network layer to guide the NLE generation, which aligns the generated explanations with the model's underlying reasoning toward the predicted answer. Experiments on T5 and BART using three reasoning datasets show that G-Tex improves NLE faithfulness by up to 12.18% compared to baseline methods. Additionally, G-Tex generates NLEs with greater semantic and lexical similarity to human-written ones. Human evaluations show that G-Tex can decrease redundant content and enhance the overall quality of NLEs. Our work presents a novel method for explicitly guiding NLE generation to enhance faithfulness, serving as a foundation for addressing broader criteria in NLE and generated text.