🤖 AI Summary
Existing image classification explanation methods face a fundamental trade-off: formal approaches (e.g., logical explanations) rely on strong model assumptions and thus fail to accommodate black-box image models, whereas black-box methods lack theoretical rigor. This paper proposes a causal explanation framework for image classification, formally defining three key desiderata—sufficiency, contrastivity, and completeness—for explanations. We introduce Confidence-Aware Complete Causal Explanations (CACE), a gradient-free, parameter-free, and model-architecture-agnostic method enabling rigorous causal reasoning over black-box classifiers. By integrating causal inference with counterfactual analysis and confidence-aware modeling, CACE supports verifiable, formal explanation generation. Extensive experiments across diverse architectures demonstrate that CACE uncovers distinct, semantically meaningful causal feature patterns. On average, per-image explanation takes under six seconds, achieving both theoretical soundness and practical scalability.
📝 Abstract
Existing algorithms for explaining the outputs of image classifiers are based on a variety of approaches and produce explanations that lack formal rigor. On the other hand, logic-based explanations are formally and rigorously defined but their computability relies on strict assumptions about the model that do not hold on image classifiers.
In this paper, we show that causal explanations, in addition to being formally and rigorously defined, enjoy the same formal properties as logic-based ones, while still lending themselves to black-box algorithms and being a natural fit for image classifiers. We prove formal properties of causal explanations and introduce contrastive causal explanations for image classifiers. Moreover, we augment the definition of explanation with confidence awareness and introduce complete causal explanations: explanations that are classified with exactly the same confidence as the original image.
We implement our definitions, and our experimental results demonstrate that different models have different patterns of sufficiency, contrastiveness, and completeness. Our algorithms are efficiently computable, taking on average 6s per image on a ResNet50 model to compute all types of explanations, and are totally black-box, needing no knowledge of the model, no access to model internals, no access to gradient, nor requiring any properties, such as monotonicity, of the model.