Enhancing Interpretability for Vision Models via Shapley Value Optimization

📅 2025-12-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Insufficient interpretability of deep vision models remains a critical challenge: post-hoc explanation methods suffer from low fidelity, while self-explaining models compromise accuracy and architectural generality. To address this, we propose a Shapley-value-driven joint training framework that integrates Shapley attribution as a differentiable auxiliary task into the backbone network’s end-to-end training—without modifying the model architecture—enabling fair and consistent pixel- or patch-level attributions aligned with predictions. Our method introduces the first gradient-differentiable Shapley approximation optimization mechanism, compatible with both Vision Transformers (ViTs) and CNNs in a plug-and-play manner. Evaluated on multiple benchmarks, it achieves state-of-the-art interpretability: attribution fidelity improves by 12.7%, with classification accuracy degradation under 0.3%. The approach thus uniquely balances high interpretability, minimal performance trade-off, and broad architectural compatibility.

Technology Category

Application Category

📝 Abstract
Deep neural networks have demonstrated remarkable performance across various domains, yet their decision-making processes remain opaque. Although many explanation methods are dedicated to bringing the obscurity of DNNs to light, they exhibit significant limitations: post-hoc explanation methods often struggle to faithfully reflect model behaviors, while self-explaining neural networks sacrifice performance and compatibility due to their specialized architectural designs. To address these challenges, we propose a novel self-explaining framework that integrates Shapley value estimation as an auxiliary task during training, which achieves two key advancements: 1) a fair allocation of the model prediction scores to image patches, ensuring explanations inherently align with the model's decision logic, and 2) enhanced interpretability with minor structural modifications, preserving model performance and compatibility. Extensive experiments on multiple benchmarks demonstrate that our method achieves state-of-the-art interpretability.
Problem

Research questions and friction points this paper is trying to address.

Improves interpretability of vision models via Shapley value optimization
Ensures explanations align with model decisions without sacrificing performance
Addresses limitations of post-hoc and self-explaining neural networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates Shapley value estimation as auxiliary training task
Fairly allocates prediction scores to image patches for alignment
Enhances interpretability with minor structural modifications preserving performance
🔎 Similar Papers
No similar papers found.
K
Kanglong Fan
Department of Computer Science, City University of Hong Kong
Yunqiao Yang
Yunqiao Yang
City University of Hong Kong
Transfer LearningMachine Learning
C
Chen Ma
Department of Computer Science, City University of Hong Kong