🤖 AI Summary
This work addresses the lack of formal definitions and verification mechanisms for “explanability” in multi-agent systems. We first model explainability as a system hyperproperty and propose a novel modal logic framework integrating Lewis’s counterfactual logic, Linear Temporal Logic (LTL), and epistemic modalities. This framework precisely captures the semantic question “Is a given observation explainable to a specific agent?” and supports cross-trajectory reasoning via embedding into hyperlogic. Theoretically, we establish the decidability of the corresponding model-checking problem. Practically, we implement automated static and dynamic verification of counterfactual explainability. Our contribution provides the first formal foundation for explainability in autonomous systems that simultaneously ensures expressive power, verifiability, and computational tractability.
📝 Abstract
Explainability is emerging as a key requirement for autonomous systems. While many works have focused on what constitutes a valid explanation, few have considered formalizing explainability as a system property. In this work, we approach this problem from the perspective of hyperproperties. We start with a combination of three prominent flavors of modal logic and show how they can be used for specifying and verifying counterfactual explainability in multi-agent systems: With Lewis' counterfactuals, linear-time temporal logic, and a knowledge modality, we can reason about whether agents know why a specific observation occurs, i.e., whether that observation is explainable to them. We use this logic to formalize multiple notions of explainability on the system level. We then show how this logic can be embedded into a hyperlogic. Notably, from this analysis we conclude that the model-checking problem of our logic is decidable, which paves the way for the automated verification of explainability requirements.