A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges

📅 2022-11-12
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Deep reinforcement learning (DRL) excels in complex control tasks but its black-box nature severely hinders trustworthy deployment in safety-critical and high-reliability applications. Method: This paper presents a systematic survey of eXplainable Reinforcement Learning (XRL), proposing the first unified taxonomy spanning four dimensions—model, reward, state, and task—and formally integrating human-prior-guided RL methods into the XRL theoretical framework. Through comprehensive literature analysis, taxonomic modeling, and systematic curation of open-source resources (hosted in a dedicated GitHub knowledge base), we construct a holistic XRL landscape. Contribution/Results: Our work fills a critical gap in multi-dimensional interpretability classification and advances XRL from methodological exploration toward practical, trustworthy, and safe RL deployment. It clarifies core challenges—including fidelity-utility trade-offs, dynamic environment explainability, and human-in-the-loop validation—and charts actionable research trajectories for next-generation interpretable RL systems.
📝 Abstract
Reinforcement Learning (RL) is a popular machine learning paradigm where intelligent agents interact with the environment to fulfill a long-term goal. Driven by the resurgence of deep learning, Deep RL (DRL) has witnessed great success over a wide spectrum of complex control tasks. Despite the encouraging results achieved, the deep neural network-based backbone is widely deemed as a black box that impedes practitioners to trust and employ trained agents in realistic scenarios where high security and reliability are essential. To alleviate this issue, a large volume of literature devoted to shedding light on the inner workings of the intelligent agents has been proposed, by constructing intrinsic interpretability or post-hoc explainability. In this survey, we provide a comprehensive review of existing works on eXplainable RL (XRL) and introduce a new taxonomy where prior works are clearly categorized into model-explaining, reward-explaining, state-explaining, and task-explaining methods. We also review and highlight RL methods that conversely leverage human knowledge to promote learning efficiency and performance of agents while this kind of method is often ignored in XRL field. Some challenges and opportunities in XRL are discussed. This survey intends to provide a high-level summarization of XRL and to motivate future research on more effective XRL solutions. Corresponding open source codes are collected and categorized at https://github.com/Plankson/awesome-explainable-reinforcement-learning.
Problem

Research questions and friction points this paper is trying to address.

Understanding black-box nature of deep reinforcement learning agents
Surveying methods for explainable reinforcement learning (XRL)
Addressing challenges in trust and deployment of RL systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Survey on eXplainable RL concepts and algorithms
Taxonomy categorizing model, reward, state, task methods
Leveraging human knowledge to enhance RL efficiency
🔎 Similar Papers
No similar papers found.
Yunpeng Qing
Yunpeng Qing
Zhejiang University
Reinforcement Learning
Shunyu Liu
Shunyu Liu
Nanyang Technological University
Multi-Agent LearningReinforcement LearningLarge Language ModelsPower System Control
J
Jie Song
College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
H
Huiqiong Wang
College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
M
Mingli Song
State Key Laboratory of Blockchain and Security, Zhejiang University, Hangzhou 310027, China