Rule-Guided Reinforcement Learning Policy Evaluation and Improvement

📅 2025-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the poor interpretability and difficulty in localizing defects of deep reinforcement learning (DRL) policies. To this end, we propose a domain-knowledge-guided rule-based framework. Methodologically, we introduce meta-transformation relations into RL for the first time, integrating rule mining, symbolic-neural hybrid evaluation, and rule injection to enable automatic discovery and semantic attribution of policy weaknesses. Our key contribution is a novel interpretable rule-policy co-design paradigm that supports semantic-level diagnosis and targeted optimization of policy defects. Evaluated across 11 standard RL benchmarks, the framework accurately identifies human-interpretable policy deficiencies and consistently improves cumulative reward—demonstrating its effectiveness, generalizability, and cross-environment transferability.

Technology Category

Application Category

📝 Abstract
We consider the challenging problem of using domain knowledge to improve deep reinforcement learning policies. To this end, we propose LEGIBLE, a novel approach, following a multi-step process, which starts by mining rules from a deep RL policy, constituting a partially symbolic representation. These rules describe which decisions the RL policy makes and which it avoids making. In the second step, we generalize the mined rules using domain knowledge expressed as metamorphic relations. We adapt these relations from software testing to RL to specify expected changes of actions in response to changes in observations. The third step is evaluating generalized rules to determine which generalizations improve performance when enforced. These improvements show weaknesses in the policy, where it has not learned the general rules and thus can be improved by rule guidance. LEGIBLE supported by metamorphic relations provides a principled way of expressing and enforcing domain knowledge about RL environments. We show the efficacy of our approach by demonstrating that it effectively finds weaknesses, accompanied by explanations of these weaknesses, in eleven RL environments and by showcasing that guiding policy execution with rules improves performance w.r.t. gained reward.
Problem

Research questions and friction points this paper is trying to address.

Improving deep reinforcement learning policies using domain knowledge.
Mining and generalizing rules from RL policies for better performance.
Evaluating and enforcing rules to identify and address policy weaknesses.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mining rules from deep RL policies
Generalizing rules using metamorphic relations
Enforcing domain knowledge to improve RL performance
🔎 Similar Papers
No similar papers found.