🤖 AI Summary
This work addresses the lack of interpretability in deep visual models by proposing a novel method that unifies local and global explanations into human-readable monotonic disjunctive normal form (MDNF) logical formulas. The approach characterizes classification rationales for individual images or image sets using human-recognizable primitive concepts and supports the generation of multi-class explanation lists. By extracting high-fidelity, high-coverage logical explanations from black-box models, the method achieves significantly enhanced interpretability while maintaining formulaic conciseness. Experimental results demonstrate that the proposed technique delivers accurate and comprehensible explanations on complex visual datasets, effectively balancing fidelity to the original model with human-understandable reasoning.
📝 Abstract
While deep neural networks are extremely effective at classifying images, they remain opaque and hard to interpret. We introduce local and global explanation methods for black-box models that generate explanations in terms of human-recognizable primitive concepts. Both the local explanations for a single image and the global explanations for a set of images are cast as logical formulas in monotone disjunctive-normal-form (MDNF), whose satisfaction guarantees that the model yields a high score on a given class. We also present an algorithm for explaining the classification of examples into multiple classes in the form of a monotone explanation list over primitive concepts. Despite their simplicity and interpretability we show that the explanations maintain high fidelity and coverage with respect to the blackbox models they seek to explain in challenging vision datasets.