๐ค AI Summary
Optimal policies for Markov decision processes (MDPs) are often large and unintelligible, hindering interpretability and deployment.
Method: This paper proposes an SMT-based policy compression framework that synthesizes compact, interpretable decision trees while guaranteeing optimality. It integrates abstraction-refinement with family-based Markov chain verification in a synergistic search scheme, explicitly constraining tree size. Policy synthesis is encoded as a semantic-constrained SMT problem, augmented by policy-space pruning and model-family verification to ensure efficiency and correctness.
Contribution/Results: Evaluated on benchmark MDPs with up to 10,000 states and 19-dimensional feature spaces, our method achieves up to 20ร reduction in decision tree size while preserving near-optimal performance. The resulting trees significantly outperform those produced by state-of-the-art approaches in both compactness and fidelity to the optimal policy.
๐ Abstract
Markov decision processes (MDPs) describe sequential decision-making processes; MDP policies return for every state in that process an advised action. Classical algorithms can efficiently compute policies that are optimal with respect to, e.g., reachability probabilities. However, these policies are then given in a tabular format. A longstanding challenge is to represent optimal or almost-optimal policies concisely, e.g., as decision trees. This paper makes two contributions towards this challenge: first, an SMT-based approach to encode a given (optimal) policy as a small decision tree, and second, an abstraction-refinement loop that searches for policies that are optimal within the set of policies that can be represented with a small tree. Technically, the latter combines the SMT encoding with verification approaches for families of Markov chains. The empirical evaluation demonstrates the feasibility of these approaches and shows how they can outperform the state-of-the-art on various benchmarks, yielding up to 20 times smaller trees representing (almost) optimal policies for models with up to 10k states and 19 variables.