🤖 AI Summary
This work addresses the absence of an explanatory framework for black-box models that supports continuous integration, updating, and auditing throughout their entire lifecycle. It introduces constructive empiricism into explainable artificial intelligence (XAI) and proposes the “Scientific Theory of Black Boxes” (SToBB) framework, which constructs dynamic, transparent, and auditable explanatory artifacts through traceable hypothesis classes, extensible observational foundations, and modular algorithmic components. Central to the framework are rule-based surrogate models, an online learning mechanism, structured documentation, and a novel Constructive Box Theoriser (CoBoT) algorithm. Empirical evaluation on neural network classifiers for tabular data demonstrates that SToBB effectively maintains explanation consistency, facilitates third-party auditing and analytical reuse, and satisfies diverse user queries.
📝 Abstract
Explainable AI (XAI) offers a growing number of algorithms that aim to answer specific questions about black-box models. What is missing is a principled way to consolidate explanatory information about a fixed black-box model into a persistent, auditable artefact, that accompanies the black-box throughout its life cycle. We address this gap by introducing the notion of a scientific theory of a black (SToBB). Grounded in Constructive Empiricism, a SToBB fulfils three obligations: (i) empirical adequacy with respect to all available observations of black-box behaviour, (ii) adaptability via explicit update commitments that restore adequacy when new observations arrive, and (iii) auditability through transparent documentation of assumptions, construction choices, and update behaviour. We operationalise these obligations as a general framework that specifies an extensible observation base, a traceable hypothesis class, algorithmic components for construction and revision, and documentation sufficient for third-party assessment. Explanations for concrete stakeholder needs are then obtained by querying the maintained record through interfaces, rather than by producing isolated method outputs. As a proof of concept, we instantiate a complete SToBB for a neural-network classifier on a tabular task and introduce the Constructive Box Theoriser (CoBoT) algorithm, an online procedure that constructs and maintains an empirically adequate rule-based surrogate as observations accumulate. Together, these contributions position SToBBs as a life cycle-scale, inspectable point of reference that supports consistent, reusable analyses and systematic external scrutiny.