Attribution-based Explanations for Markov Decision Processes

📅 2026-05-10

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work addresses the challenge that existing attribution methods struggle to capture the dynamic importance of inputs in sequential decision-making within Markov Decision Processes (MDPs). It pioneers the extension of attribution-based interpretability to sequential decision settings by formally defining the importance of states and execution trajectories. The authors propose an efficient, policy-synthesis-based attribution framework that maintains theoretical rigor while significantly enhancing the interpretability of agent decision logic. Empirical validation across five case studies demonstrates that the framework effectively uncovers the influence of critical states and paths on policy behavior, offering fine-grained and trustworthy explanations for sequential decision systems.

📝 Abstract

Attribution techniques explain the outcome of an AI model by assigning a numerical score to its inputs. So far, these techniques have mainly focused on attributing importance to static input features at a single point in time, and thus fail to generalize to sequential decision-making settings. This paper fills this gap by introducing techniques to generate attribution-based explanations for Markov Decision Processes (MDPs). We give a formal characterization of what attributions should represent in MDPs, focusing on explanations that assign importance scores to both individual states and execution paths. We show how importance scores can be computed by leveraging techniques for strategy synthesis, enabling the efficient computation of these scores despite the non-determinism inherent in an MDP. We evaluate our approach on five case-studies, demonstrating its utility in providing interpretable insights into the logic of sequential decision-making agents.

Problem

Research questions and friction points this paper is trying to address.

attribution-based explanations

Markov Decision Processes

sequential decision-making

interpretability

importance scores

Innovation

Methods, ideas, or system contributions that make the work stand out.

attribution-based explanation

Markov Decision Processes

sequential decision-making