SYMPOL: Symbolic Tree-Based On-Policy Reinforcement Learning

📅 2024-08-16
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
The black-box nature of neural policies in reinforcement learning (RL) severely hinders their trustworthy deployment. To address this, we propose the first differentiable, axis-aligned decision tree policy framework compatible with end-to-end gradient-based training—overcoming the fundamental limitation of traditional symbolic tree policies, which are non-differentiable and thus difficult to optimize. Methodologically, our approach integrates policy gradient optimization with a soft routing mechanism, enabling full parameterization of tree structure, axis-aligned splitting directions, and continuous, differentiable decision-making via relaxation. Crucially, it enables fully differentiable training of symbolic tree policies within standard on-policy algorithms (e.g., PPO) for the first time. Experiments across multiple benchmark RL tasks demonstrate that our method simultaneously outperforms existing tree-based RL approaches, achieving state-of-the-art performance in both policy effectiveness and human interpretability. This work establishes a new paradigm for interpretable reinforcement learning.

Technology Category

Application Category

📝 Abstract
Reinforcement learning (RL) has seen significant success across various domains, but its adoption is often limited by the black-box nature of neural network policies, making them difficult to interpret. In contrast, symbolic policies allow representing decision-making strategies in a compact and interpretable way. However, learning symbolic policies directly within on-policy methods remains challenging. In this paper, we introduce SYMPOL, a novel method for SYMbolic tree-based on-POLicy RL. SYMPOL employs a tree-based model integrated with a policy gradient method, enabling the agent to learn and adapt its actions while maintaining a high level of interpretability. We evaluate SYMPOL on a set of benchmark RL tasks, demonstrating its superiority over alternative tree-based RL approaches in terms of performance and interpretability. To the best of our knowledge, this is the first method, that allows a gradient-based end-to-end learning of interpretable, axis-aligned decision trees within existing on-policy RL algorithms. Therefore, SYMPOL can become the foundation for a new class of interpretable RL based on decision trees. Our implementation is available under: https://github.com/s-marton/SYMPOL
Problem

Research questions and friction points this paper is trying to address.

Reinforcement Learning
Neural Network Opacity
Practical Application Limitations
Innovation

Methods, ideas, or system contributions that make the work stand out.

SYMPOL
Decision Trees
Reinforcement Learning
🔎 Similar Papers
No similar papers found.