🤖 AI Summary
Existing image classifier explanation methods struggle to simultaneously capture global decision logic and local sample-specific characteristics, leading to ambiguous or distorted interpretations. This paper proposes a generative explainability framework based on Class-Association Embedding (CAE): first, it constructs a low-dimensional class-association manifold that disentangles class-shared features from instance-specific ones; second, it introduces a block-wise consistent feature disentanglement algorithm and a differentiable feature recomposition mechanism, enabling continuous, directed, and semantically coherent counterfactual image generation and decision-path visualization. By operating intrinsically within the learned manifold, our approach avoids local optimization pitfalls and achieves, for the first time, class-level semantic-guided interpretability. Evaluated on multiple benchmark datasets, it improves explanation accuracy by 12.3% over state-of-the-art methods, generates high-fidelity counterfactuals, and uncovers intrinsic structures of classification boundaries and data distributions.
📝 Abstract
Image classification is a primary task in data analy-sis where explainable models are crucially demanded in various applications. Although amounts of methods have been proposed to obtain explainable knowledge from the black-box classifiers, these approaches lack the efficiency of extracting global knowl-edge regarding the classification task, thus is vulnerable to local traps and often leads to poor accuracy. In this study, we propose a generative explanation model that combines the advantages of global and local knowledge for explaining image classifiers. We develop a representation learning method called class association embedding (CAE), which encodes each sample into a pair of separated class-associated and individual codes. Recombining the individual code of a given sample with altered class-associated code leads to a synthetic real-looking sample with preserved individual characters but modified class-associated features and possibly flipped class assignments. A building-block coherency feature extraction algorithm is proposed that efficiently separates class-associated features from individual ones. The extracted feature space forms a low-dimensional manifold that visualizes the classification decision patterns. Explanation on each individual sample can be then achieved in a counter-factual generation manner which continuously modifies the sample in one direction, by shifting its class-associated code along a guided path, until its classification outcome is changed. We compare our method with state-of-the-art ones on explaining image classification tasks in the form of saliency maps, demonstrating that our method achieves higher accuracies. The class-associated manifold not only helps with skipping local traps and achieving accurate explanation, but also provides insights to the data distribution patterns that potentially aids knowledge discovery. The code is available at https://github.com/xrtll/xAI-CODE.