🤖 AI Summary
Prototype networks possess inherent interpretability, yet their prototype-based explanations often suffer from ambiguity—distinct predictions may share identical explanations, severely undermining trustworthiness in safety-critical applications. To address this, we propose Abductive Latent Explanations (ALEs), the first framework integrating formal explainable AI with prototype networks. ALEs construct explanations in the latent space endowed with sufficient-condition guarantees, ensuring a theoretically sound one-to-one correspondence between explanations and predictions. Our method employs a scalable, SMT-solver-free algorithm that unifies counterfactual reasoning, causal inference, and constrained optimization to generate ALEs. Extensive evaluation on standard and fine-grained image classification benchmarks demonstrates substantial improvements in explanation consistency and reliability. This work provides the first formally guaranteed interpretability enhancement for prototype networks, advancing both theoretical rigor and practical applicability in trustworthy AI.
📝 Abstract
Case-based reasoning networks are machine-learning models that make predictions based on similarity between the input and prototypical parts of training samples, called prototypes. Such models are able to explain each decision by pointing to the prototypes that contributed the most to the final outcome. As the explanation is a core part of the prediction, they are often qualified as ``interpretable by design". While promising, we show that such explanations are sometimes misleading, which hampers their usefulness in safety-critical contexts. In particular, several instances may lead to different predictions and yet have the same explanation. Drawing inspiration from the field of formal eXplainable AI (FXAI), we propose Abductive Latent Explanations (ALEs), a formalism to express sufficient conditions on the intermediate (latent) representation of the instance that imply the prediction. Our approach combines the inherent interpretability of case-based reasoning models and the guarantees provided by formal XAI. We propose a solver-free and scalable algorithm for generating ALEs based on three distinct paradigms, compare them, and present the feasibility of our approach on diverse datasets for both standard and fine-grained image classification. The associated code can be found at https://github.com/julsoria/ale