๐ค AI Summary
This paper addresses online robust planning under model uncertainty. Existing robust Markov decision process (RMDP) approaches provide theoretical guarantees but suffer from high computational overhead, hindering real-time deployment; conversely, generative models based on finite-sample learning often incur approximation errors that compromise safety and performance. To bridge this gap, we propose Robust Sparse Sampling (RSS), the first RMDP online planning algorithm to integrate sampling average approximation (SAA) with sparse trajectory sampling for efficient and robust value function estimationโits computational complexity is independent of the state-space size. We establish finite-sample convergence guarantees for RSS under mild assumptions. Empirical evaluations demonstrate that RSS significantly improves policy robustness and safety over standard sparse sampling, effectively breaking the traditional trade-off between real-time responsiveness and safety assurance.
๐ Abstract
Online planning in Markov Decision Processes (MDPs) enables agents to make sequential decisions by simulating future trajectories from the current state, making it well-suited for large-scale or dynamic environments. Sample-based methods such as Sparse Sampling and Monte Carlo Tree Search (MCTS) are widely adopted for their ability to approximate optimal actions using a generative model. However, in practical settings, the generative model is often learned from limited data, introducing approximation errors that can degrade performance or lead to unsafe behaviors. To address these challenges, Robust MDPs (RMDPs) offer a principled framework for planning under model uncertainty, yet existing approaches are typically computationally intensive and not suited for real-time use. In this work, we introduce Robust Sparse Sampling (RSS), the first online planning algorithm for RMDPs with finite-sample theoretical performance guarantees. Unlike Sparse Sampling, which estimates the nominal value function, RSS computes a robust value function by leveraging the efficiency and theoretical properties of Sample Average Approximation (SAA), enabling tractable robust policy computation in online settings. RSS is applicable to infinite or continuous state spaces, and its sample and computational complexities are independent of the state space size. We provide theoretical performance guarantees and empirically show that RSS outperforms standard Sparse Sampling in environments with uncertain dynamics.