🤖 AI Summary
To address the lack of interpretability in neural network classification behavior, this paper proposes a formal-logic-based spatial explanation method that provides provably correct semantic explanations for model decisions over continuous input regions. Methodologically, it introduces the first integration of Craig interpolation with UNSAT core generation to construct a verifiable framework for local decision-region partitioning. Unlike proxy-model–based or heuristic approximation approaches, our method automatically synthesizes compact, precise, and semantically transparent explanation rules via logical reasoning alone. Experimental evaluation across multiple real-world datasets of varying scale demonstrates that the generated explanations significantly outperform existing state-of-the-art methods—achieving simultaneous improvements in fidelity, comprehensibility, and formal verifiability. This work establishes a novel, rigorously grounded pathway toward trustworthy AI through formal explainability.
📝 Abstract
We present a novel logic-based concept called Space Explanations for classifying neural networks that gives provable guarantees of the behavior of the network in continuous areas of the input feature space. To automatically generate space explanations, we leverage a range of flexible Craig interpolation algorithms and unsatisfiable core generation. Based on real-life case studies, ranging from small to medium to large size, we demonstrate that the generated explanations are more meaningful than those computed by state-of-the-art.