🤖 AI Summary
Insufficient interpretability of deep vision models remains a critical challenge: post-hoc explanation methods suffer from low fidelity, while self-explaining models compromise accuracy and architectural generality. To address this, we propose a Shapley-value-driven joint training framework that integrates Shapley attribution as a differentiable auxiliary task into the backbone network’s end-to-end training—without modifying the model architecture—enabling fair and consistent pixel- or patch-level attributions aligned with predictions. Our method introduces the first gradient-differentiable Shapley approximation optimization mechanism, compatible with both Vision Transformers (ViTs) and CNNs in a plug-and-play manner. Evaluated on multiple benchmarks, it achieves state-of-the-art interpretability: attribution fidelity improves by 12.7%, with classification accuracy degradation under 0.3%. The approach thus uniquely balances high interpretability, minimal performance trade-off, and broad architectural compatibility.
📝 Abstract
Deep neural networks have demonstrated remarkable performance across various domains, yet their decision-making processes remain opaque. Although many explanation methods are dedicated to bringing the obscurity of DNNs to light, they exhibit significant limitations: post-hoc explanation methods often struggle to faithfully reflect model behaviors, while self-explaining neural networks sacrifice performance and compatibility due to their specialized architectural designs. To address these challenges, we propose a novel self-explaining framework that integrates Shapley value estimation as an auxiliary task during training, which achieves two key advancements: 1) a fair allocation of the model prediction scores to image patches, ensuring explanations inherently align with the model's decision logic, and 2) enhanced interpretability with minor structural modifications, preserving model performance and compatibility. Extensive experiments on multiple benchmarks demonstrate that our method achieves state-of-the-art interpretability.