Enhancing Interpretability for Vision Models via Shapley Value Optimization

📅 2025-12-16

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Insufficient interpretability of deep vision models remains a critical challenge: post-hoc explanation methods suffer from low fidelity, while self-explaining models compromise accuracy and architectural generality. To address this, we propose a Shapley-value-driven joint training framework that integrates Shapley attribution as a differentiable auxiliary task into the backbone network’s end-to-end training—without modifying the model architecture—enabling fair and consistent pixel- or patch-level attributions aligned with predictions. Our method introduces the first gradient-differentiable Shapley approximation optimization mechanism, compatible with both Vision Transformers (ViTs) and CNNs in a plug-and-play manner. Evaluated on multiple benchmarks, it achieves state-of-the-art interpretability: attribution fidelity improves by 12.7%, with classification accuracy degradation under 0.3%. The approach thus uniquely balances high interpretability, minimal performance trade-off, and broad architectural compatibility.

Technology Category

Application Category

📝 Abstract

Deep neural networks have demonstrated remarkable performance across various domains, yet their decision-making processes remain opaque. Although many explanation methods are dedicated to bringing the obscurity of DNNs to light, they exhibit significant limitations: post-hoc explanation methods often struggle to faithfully reflect model behaviors, while self-explaining neural networks sacrifice performance and compatibility due to their specialized architectural designs. To address these challenges, we propose a novel self-explaining framework that integrates Shapley value estimation as an auxiliary task during training, which achieves two key advancements: 1) a fair allocation of the model prediction scores to image patches, ensuring explanations inherently align with the model's decision logic, and 2) enhanced interpretability with minor structural modifications, preserving model performance and compatibility. Extensive experiments on multiple benchmarks demonstrate that our method achieves state-of-the-art interpretability.

Problem

Research questions and friction points this paper is trying to address.

Improves interpretability of vision models via Shapley value optimization

Ensures explanations align with model decisions without sacrificing performance

Addresses limitations of post-hoc and self-explaining neural networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates Shapley value estimation as auxiliary training task

Fairly allocates prediction scores to image patches for alignment

Enhances interpretability with minor structural modifications preserving performance

🔎 Similar Papers

No similar papers found.