Transferring Visual Explainability of Self-Explaining Models through Task Arithmetic

📅 2025-07-06

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

Cross-domain transfer of visual interpretability for image classification typically requires additional target-domain annotations and incurs high computational costs. Method: This paper proposes a task-arithmetic-based framework for transferring interpretability of self-explaining models, introducing the novel concept of “explanatory vectors.” It constructs a source-domain self-explaining model using vision-language pretrained models and enables zero-shot transfer to the target domain via task arithmetic in parameter space—without target-domain labels or fine-tuning. Contribution/Results: The approach significantly reduces deployment overhead. Experiments demonstrate that explanatory vectors trained on ImageNet exhibit strong generalization and robustness across diverse datasets. Their explanation quality matches that of Kernel SHAP—despite requiring only a single forward pass versus Kernel SHAP’s 150 inference steps—while preserving original classification accuracy.

Technology Category

Application Category

📝 Abstract

In scenarios requiring both prediction and explanation efficiency for image classification, self-explaining models that perform both tasks in a single inference are effective. However, their training incurs substantial labeling and computational costs. This study aims to tackle the issue by proposing a method to transfer the visual explainability of self-explaining models, learned in a source domain, to a target domain based on a task arithmetic framework. Specifically, we construct a self-explaining model by extending image classifiers based on a vision-language pretrained model. We then define an emph{explainability vector} as the difference between model parameters trained on the source domain with and without explanation supervision. Based on the task arithmetic framework, we impart explainability to a model trained only on the prediction task in the target domain by applying the explainability vector. Experimental results on various image classification datasets demonstrate that, except for transfers between some less-related domains, visual explainability can be successfully transferred from source to target domains, improving explanation quality in the target domain without sacrificing classification accuracy. Furthermore, we show that the explainability vector learned on a large and diverse dataset like ImageNet, extended with explanation supervision, exhibits universality and robustness, improving explanation quality on nine out of ten different target datasets. We also find that the explanation quality achieved with a single model inference is comparable to that of Kernel SHAP, which requires 150 model inferences.

Problem

Research questions and friction points this paper is trying to address.

Transfer visual explainability between domains efficiently

Reduce labeling and computational costs in model training

Maintain classification accuracy while improving explanation quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transfer visual explainability via task arithmetic

Extend classifiers with vision-language pretrained models

Define explainability vector for parameter difference

🔎 Similar Papers

T-TAME: Trainable Attention Mechanism for Explaining Convolutional Networks and Vision Transformers