Actionable Interpretability Must Be Defined in Terms of Symmetries

📅 2026-01-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of operationalizability in existing definitions of interpretability, which hinders their utility in guiding model design and reasoning. It proposes the first formalization of interpretability as a symmetry problem, deriving properties and categories of interpretable models from four fundamental symmetry classes. Building on this foundation, the paper constructs a unified Bayesian inversion framework that naturally integrates core reasoning tasks—such as alignment, intervention, and counterfactual inference—into a coherent structure. This approach establishes the first symmetry-based, operationally grounded theory of interpretability, offering a rigorous formal foundation for reasoning in artificial intelligence systems.

Technology Category

Application Category

📝 Abstract
This paper argues that interpretability research in Artificial Intelligence (AI) is fundamentally ill-posed as existing definitions of interpretability fail to describe how interpretability can be formally tested or designed for. We posit that actionable definitions of interpretability must be formulated in terms of *symmetries* that inform model design and lead to testable conditions. Under a probabilistic view, we hypothesise that four symmetries (inference equivariance, information invariance, concept-closure invariance, and structural invariance) suffice to (i) formalise interpretable models as a subclass of probabilistic models, (ii) yield a unified formulation of interpretable inference (e.g., alignment, interventions, and counterfactuals) as a form of Bayesian inversion, and (iii) provide a formal framework to verify compliance with safety standards and regulations.
Problem

Research questions and friction points this paper is trying to address.

interpretability
actionable
symmetries
AI
formal principles
Innovation

Methods, ideas, or system contributions that make the work stand out.

symmetries
actionable interpretability
Bayesian inversion
interpretable models
formal principles