🤖 AI Summary
Existing neural network explanation methods suffer from insufficient faithfulness and lack effective means to evaluate intermediate-layer reasoning processes. Method: We propose FEI (Faithfulness-Enhanced Integrated), a novel explanation framework comprising: (i) smoothed approximation techniques to improve quantitative faithfulness scores; (ii) a new qualitative faithfulness metric for hidden layers—enabling, for the first time, interpretable assessment of intermediate reasoning paths; and (iii) integration of ensemble-based explanation, hidden-layer encoding analysis, and multi-granularity visualization to establish a synergistic, multi-dimensional evaluation system combining quantitative and qualitative analysis. Results: Extensive experiments demonstrate that FEI significantly outperforms state-of-the-art methods across mainstream benchmarks, achieving substantial gains in visualization fidelity and faithfulness metrics—e.g., 32.7% reduction in Infidelity and 28.4% improvement in Erasure. FEI establishes a verifiable, diagnosable explanation paradigm for trustworthy AI.
📝 Abstract
Interpretable and faithful explanations for specific neural inferences are crucial for understanding and evaluating model behavior. Our work introduces extbf{F}aithfulness-guided extbf{E}nsemble extbf{I}nterpretation ( extbf{FEI}), an innovative framework that enhances the breadth and effectiveness of faithfulness, advancing interpretability by providing superior visualization. Through an analysis of existing evaluation benchmarks, extbf{FEI} employs a smooth approximation to elevate quantitative faithfulness scores. Diverse variations of extbf{FEI} target enhanced faithfulness in hidden layer encodings, expanding interpretability. Additionally, we propose a novel qualitative metric that assesses hidden layer faithfulness. In extensive experiments, extbf{FEI} surpasses existing methods, demonstrating substantial advances in qualitative visualization and quantitative faithfulness scores. Our research establishes a comprehensive framework for elevating faithfulness in neural network explanations, emphasizing both breadth and precision