SPOT: Scalable Policy Optimization with Trees for Markov Decision Processes

📅 2025-10-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Interpretable policy optimization in Markov decision processes (MDPs) suffers from computational intractability and poor scalability. Method: We propose a verifiable decision-tree-based policy learning framework that formulates decision-tree policy optimization as a mixed-integer linear program (MILP). To address complexity, we design a dimensionality-reduced branch-and-bound algorithm that explicitly decouples MDP dynamic constraints from tree-structure constraints, enabling efficient parallel search while guaranteeing global optimality of the learned decision tree at each iteration. Contribution/Results: Our method achieves an order-of-magnitude speedup over state-of-the-art approaches on standard benchmarks, scales to significantly larger MDPs, and yields policies that simultaneously achieve high performance, compact representation, and strong interpretability—making them suitable for high-stakes decision-making domains.

Technology Category

Application Category

📝 Abstract
Interpretable reinforcement learning policies are essential for high-stakes decision-making, yet optimizing decision tree policies in Markov Decision Processes (MDPs) remains challenging. We propose SPOT, a novel method for computing decision tree policies, which formulates the optimization problem as a mixed-integer linear program (MILP). To enhance efficiency, we employ a reduced-space branch-and-bound approach that decouples the MDP dynamics from tree-structure constraints, enabling efficient parallel search. This significantly improves runtime and scalability compared to previous methods. Our approach ensures that each iteration yields the optimal decision tree. Experimental results on standard benchmarks demonstrate that SPOT achieves substantial speedup and scales to larger MDPs with a significantly higher number of states. The resulting decision tree policies are interpretable and compact, maintaining transparency without compromising performance. These results demonstrate that our approach simultaneously achieves interpretability and scalability, delivering high-quality policies an order of magnitude faster than existing approaches.
Problem

Research questions and friction points this paper is trying to address.

Optimizing interpretable decision tree policies for MDPs
Solving policy optimization via mixed-integer linear programming
Enhancing scalability and speed for large state spaces
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizes decision trees via mixed-integer linear programming
Decouples MDP dynamics from tree constraints for efficiency
Ensures optimal decision trees with parallel search scalability
🔎 Similar Papers
No similar papers found.
X
Xuyuan Xiong
Shanghai Jiao Tong University
P
Pedro Chumpitaz-Flores
University of South Florida
Kaixun Hua
Kaixun Hua
Assistant Professor, University of South Florida
Trustworthy AIClusteringGlobal Optimization
C
Cheng Hua
Shanghai Jiao Tong University