Space Explanations of Neural Network Classification

📅 2025-11-27

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

To address the lack of interpretability in neural network classification behavior, this paper proposes a formal-logic-based spatial explanation method that provides provably correct semantic explanations for model decisions over continuous input regions. Methodologically, it introduces the first integration of Craig interpolation with UNSAT core generation to construct a verifiable framework for local decision-region partitioning. Unlike proxy-model–based or heuristic approximation approaches, our method automatically synthesizes compact, precise, and semantically transparent explanation rules via logical reasoning alone. Experimental evaluation across multiple real-world datasets of varying scale demonstrates that the generated explanations significantly outperform existing state-of-the-art methods—achieving simultaneous improvements in fidelity, comprehensibility, and formal verifiability. This work establishes a novel, rigorously grounded pathway toward trustworthy AI through formal explainability.

Technology Category

Application Category

📝 Abstract

We present a novel logic-based concept called Space Explanations for classifying neural networks that gives provable guarantees of the behavior of the network in continuous areas of the input feature space. To automatically generate space explanations, we leverage a range of flexible Craig interpolation algorithms and unsatisfiable core generation. Based on real-life case studies, ranging from small to medium to large size, we demonstrate that the generated explanations are more meaningful than those computed by state-of-the-art.

Problem

Research questions and friction points this paper is trying to address.

Develops Space Explanations for neural network classification with provable guarantees

Uses Craig interpolation and unsatisfiable core generation for automatic explanation creation

Demonstrates more meaningful explanations than state-of-the-art across various case studies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Space Explanations provide provable guarantees for neural network behavior

Automated generation using Craig interpolation and unsatisfiable core algorithms

Demonstrated more meaningful explanations than state-of-the-art methods

🔎 Similar Papers

Faithful and Plausible Natural Language Explanations for Image Classification: A Pipeline Approach

2024-07-30Conference on Empirical Methods in Natural Language ProcessingCitations: 0

Bosch Group

Attraktive Vergütung

Horb am Neckar, BW, DE

ML Research Engineer, Interpretable AI for End-to-End Automated Driving

Toyota Research Institute

Los Altos, CA

Praktikum Methoden zur Validierung und Absicherung von KI-Modellen

Bosch Group

Elchingen, BY, DE

Research Scientist, Interpretability

Anthropic

$350,000—$850,000 USD

San Francisco, CA, USA / remote (case-by-case basis)

Robotics Autonomy Engineer-Planning and Control

Field AI

Irvine, CA

Authors to Follow