Faithful and Plausible Natural Language Explanations for Image Classification: A Pipeline Approach

📅 2024-07-30

🏛️ Conference on Empirical Methods in Natural Language Processing

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Existing post-hoc image classification explanation methods struggle to simultaneously achieve high faithfulness and plausibility. This paper proposes a model-agnostic natural language explanation pipeline that requires no modification to the original model training. First, it localizes critical neurons via CNN feature attribution and generates activation maps to construct structured semantic representations. Subsequently, a large language model (LLM) translates these representations into human-readable textual explanations. To our knowledge, this is the first method to significantly improve faithfulness while preserving high plausibility. Neural intervention experiments demonstrate that its critical-neuron masking efficacy is three times that of baseline approaches. The method achieves state-of-the-art performance in both human evaluation and automated faithfulness metrics (e.g., deletion/insertion scores), establishing a novel paradigm for post-hoc interpretability in vision models.

Technology Category

Application Category

📝 Abstract

Existing explanation methods for image classification struggle to provide faithful and plausible explanations. This paper addresses this issue by proposing a post-hoc natural language explanation method that can be applied to any CNN-based classifier without altering its training process or affecting predictive performance. By analysing influential neurons and the corresponding activation maps, the method generates a faithful description of the classifier's decision process in the form of a structured meaning representation, which is then converted into text by a language model. Through this pipeline approach, the generated explanations are grounded in the neural network architecture, providing accurate insight into the classification process while remaining accessible to non-experts. Experimental results show that the NLEs constructed by our method are significantly more plausible and faithful. In particular, user interventions in the neural network structure (masking of neurons) are three times more effective than the baselines.

Problem

Research questions and friction points this paper is trying to address.

Generates faithful natural language explanations for image classification.

Proposes a post-hoc method without altering CNN training or performance.

Enhances plausibility and faithfulness of explanations through neural analysis.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Post-hoc natural language explanation method

Analyzes influential neurons and activation maps

Converts structured meaning representation into text

🔎 Similar Papers

No similar papers found.