🤖 AI Summary
Existing methods struggle to simultaneously achieve human-perceptual consistency and intrinsic interpretability in model explanations. Method: This paper proposes an image-level interpretable prediction framework that jointly learns semantic grouping masks and dynamic sparsity at the instance level; it replaces pixel-wise sparsity with semantic superpixels to enable differentiable, end-to-end mask optimization, and introduces— for the first time—an instance-adaptive sparsity control mechanism, allowing the model to autonomously determine the granularity of explanation per image. Contribution/Results: Our method significantly outperforms state-of-the-art approaches on both semi-synthetic and natural image datasets. The generated explanations better align with human cognitive priors, substantially improving prediction trustworthiness and enhancing the model’s capability to attribute failures to semantically meaningful regions.
📝 Abstract
Understanding the decision-making process of machine learning models provides valuable insights into the task, the data, and the reasons behind a model's failures. In this work, we propose a method that performs inherently interpretable predictions through the instance-wise sparsification of input images. To align the sparsification with human perception, we learn the masking in the space of semantically meaningful pixel regions rather than on pixel-level. Additionally, we introduce an explicit way to dynamically determine the required level of sparsity for each instance. We show empirically on semi-synthetic and natural image datasets that our inherently interpretable classifier produces more meaningful, human-understandable predictions than state-of-the-art benchmarks.