Zero-Shot Faithful Textual Explanations via Directional-Derivative Influence on Predictions

📅 2026-05-16

📈 Citations: 0

✨ Influential: 0

career value

147K/year

🤖 AI Summary

This work addresses the limited faithfulness of existing zero-shot textual explanation methods, which often fail to accurately capture the true features underlying image classifier decisions. To overcome this, the paper proposes FaithTrace, a novel framework that constructs a proxy metric for faithfulness by analyzing how textual explanations induce directional changes in the classifier’s feature space and their corresponding impact on class logits via directional derivatives. FaithTrace is the first approach to integrate directional derivative analysis with feature-space directionality, yielding a quantifiable evaluation framework for textual explanation faithfulness. It further combines zero-shot text generation with influence quantification techniques. Experimental results demonstrate that explanations generated by FaithTrace significantly outperform current baselines, offering both higher fidelity to the model’s actual decision-making rationale and improved accuracy in human understanding of model behavior.

📝 Abstract

Zero-shot textual explanations aim to make image classifiers more transparent by probing their internal representations, without relying on task-specific supervision or LVLMs. However, existing methods often miss the features that truly drive the prediction, resulting in limited \textit{faithfulness} to the evidence underlying the model's decision. To address this, we propose FaithTrace. Motivated by the idea that faithful explanations should describe concepts that strongly influence the prediction, FaithTrace directly measures how much the representation induced by the explanation changes the class logit. We introduce an influence score, computed as the directional derivative of the class logit along the text-induced direction in the classifier's feature space, and use it as a proxy for faithfulness. Moreover, we extend this influence score into quantitative evaluation metrics, helping fill the gap in faithfulness evaluation for textual explanations. Experiments show that FaithTrace yields more faithful explanations than baselines, facilitating a more accurate understanding of the model. The code will be publicly released.

Problem

Research questions and friction points this paper is trying to address.

zero-shot textual explanations

faithfulness

image classifiers

model transparency

explanation evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-shot explanation

Faithfulness

Directional derivative