π€ AI Summary
This work addresses the challenge that existing attribution methods struggle to capture the dynamic importance of inputs in sequential decision-making within Markov Decision Processes (MDPs). It pioneers the extension of attribution-based interpretability to sequential decision settings by formally defining the importance of states and execution trajectories. The authors propose an efficient, policy-synthesis-based attribution framework that maintains theoretical rigor while significantly enhancing the interpretability of agent decision logic. Empirical validation across five case studies demonstrates that the framework effectively uncovers the influence of critical states and paths on policy behavior, offering fine-grained and trustworthy explanations for sequential decision systems.
π Abstract
Attribution techniques explain the outcome of an AI model by assigning a numerical score to its inputs. So far, these techniques have mainly focused on attributing importance to static input features at a single point in time, and thus fail to generalize to sequential decision-making settings. This paper fills this gap by introducing techniques to generate attribution-based explanations for Markov Decision Processes (MDPs). We give a formal characterization of what attributions should represent in MDPs, focusing on explanations that assign importance scores to both individual states and execution paths. We show how importance scores can be computed by leveraging techniques for strategy synthesis, enabling the efficient computation of these scores despite the non-determinism inherent in an MDP. We evaluate our approach on five case-studies, demonstrating its utility in providing interpretable insights into the logic of sequential decision-making agents.