🤖 AI Summary
To address the limited interpretability of reinforcement learning (RL) policies, this paper proposes an automated explanation generation method grounded in Linear Temporal Logic (LTL). The core method employs the action distribution divergence between the target policy and the policy induced by a candidate LTL formula as a guiding signal for LTL formula search—thereby avoiding overly general “universal explanations” and ensuring strategy-specificity and formal verifiability. Integrating Monte Carlo policy optimization with multi-scenario simulation (flag capture, parking, and robot navigation), the approach successfully generates concise, semantically transparent, and human-understandable LTL explanations across all three domains. Experimental results demonstrate that our method significantly outperforms existing baselines in both explanation accuracy and fidelity, confirming its strong generalization capability and practical utility for interpretable RL.
📝 Abstract
Explaining reinforcement learning policies is important for deploying them in real-world scenarios. We introduce a set of linear temporal logic formulae designed to provide such explanations, and an algorithm for searching through those formulae for the one that best explains a given policy. Our key idea is to compare action distributions from the target policy with those from policies optimized for candidate explanations. This comparison provides more insight into the target policy than existing methods and avoids inference of “catch-all” explanations. We demonstrate our method in a simulated game of capture-the-flag, a car-parking environment, and a robot navigation task.