Zero-Shot Textual Explanations via Translating Decision-Critical Features

📅 2025-12-08

📈 Citations: 0

✨ Influential: 0

career value

137K/year

🤖 AI Summary

Existing zero-shot explanation methods merely describe image content without revealing the classifier’s decision rationale. To address this, we propose TEXTER—the first zero-shot text-based explanation method that explicitly isolates and amplifies decision-critical neuronal features. TEXTER identifies discriminative visual features via neuron importance analysis, enhances Transformer interpretability using sparse autoencoders, and maps these critical features into the CLIP text embedding space to retrieve natural language descriptions faithful to the model’s internal reasoning. Experiments demonstrate that TEXTER achieves significantly higher fidelity and interpretability than prior approaches. Crucially, it establishes the first end-to-end alignment from activated decision-critical neurons to semantically grounded textual explanations—bridging the gap between low-level neural activations and high-level human-understandable rationales in zero-shot settings.

Technology Category

Application Category

📝 Abstract

Textual explanations make image classifier decisions transparent by describing the prediction rationale in natural language. Large vision-language models can generate captions but are designed for general visual understanding, not classifier-specific reasoning. Existing zero-shot explanation methods align global image features with language, producing descriptions of what is visible rather than what drives the prediction. We propose TEXTER, which overcomes this limitation by isolating decision-critical features before alignment. TEXTER identifies the neurons contributing to the prediction and emphasizes the features encoded in those neurons -- i.e., the decision-critical features. It then maps these emphasized features into the CLIP feature space to retrieve textual explanations that reflect the model's reasoning. A sparse autoencoder further improves interpretability, particularly for Transformer architectures. Extensive experiments show that TEXTER generates more faithful and interpretable explanations than existing methods. The code will be publicly released.

Problem

Research questions and friction points this paper is trying to address.

Generates textual explanations for image classifier decisions

Isolates decision-critical features before aligning with language

Improves interpretability using sparse autoencoder for Transformers

Innovation

Methods, ideas, or system contributions that make the work stand out.

Isolates decision-critical features before alignment

Maps emphasized features into CLIP space for retrieval

Uses sparse autoencoder to enhance interpretability

🔎 Similar Papers

Listenable Maps for Zero-Shot Audio Classifiers