Why the Agent Made that Decision: Explaining Deep Reinforcement Learning with Vision Masks

📅 2024-11-25

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

Reinforcement learning (RL) faces deployment limitations in safety-critical domains due to the opacity of its decision-making process; existing eXplainable AI (xAI) methods struggle to support counterfactual, comparative attribution (e.g., “Why action A instead of B?”). To address this, we propose VisionMask: an end-to-end self-supervised visual masking framework that localizes decision-critical visual regions without modifying or retraining the original RL agent. Its core innovation is the first self-supervised visual attribution framework requiring neither surrogate model retraining nor human annotations, enabling physically plausible counterfactual analysis. VisionMask achieves this via action-visual correlation modeling and rigorous insertion/deletion evaluation. Empirically, it improves insertion accuracy by 14.9% and F1-score by 30.08% over state-of-the-art explanation methods on Super Mario Bros and three Atari games, demonstrating superior fidelity and interpretability.

Technology Category

Application Category

📝 Abstract

Due to the inherent lack of transparency in deep neural networks, it is challenging for deep reinforcement learning (DRL) agents to gain trust and acceptance from users, especially in safety-critical applications such as medical diagnosis and military operations. Existing methods for explaining an agent's decision either require to retrain the agent using models that support explanation generation or rely on perturbation-based techniques to reveal the significance of different input features in the decision making process. However, retraining the agent may compromise its integrity and performance, while perturbation-based methods have limited performance and lack knowledge accumulation or learning capabilities. Moreover, since each perturbation is performed independently, the joint state of the perturbed inputs may not be physically meaningful. To address these challenges, we introduce $ extbf{VisionMask}$, a standalone explanation model trained end-to-end to identify the most critical regions in the agent's visual input that can explain its actions. VisionMask is trained in a self-supervised manner without relying on human-generated labels. Importantly, its training does not alter the agent model, hence preserving the agent's performance and integrity. We evaluate VisionMask on Super Mario Bros (SMB) and three Atari games. Compared to existing methods, VisionMask achieves a 14.9% higher insertion accuracy and a 30.08% higher F1-Score in reproducing original actions from the selected visual explanations. We also present examples illustrating how VisionMask can be used for counterfactual analysis.

Problem

Research questions and friction points this paper is trying to address.

Lack of interpretability in RL decision-making processes

Existing xAI fails to provide contrastive explanations for RL

Need for human-understandable explanations without sacrificing accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive learning for RL action explanations

Self-supervised comparison of chosen and alternative actions

VisionMask enhances interpretability without sacrificing accuracy

🔎 Similar Papers

No similar papers found.