🤖 AI Summary
Evaluating the robustness of agents in non-stationary Markov decision processes (NS-MDPs) lacks standardized, reproducible benchmarks. Method: We introduce NS-Gym, the first open-source simulation toolbox deeply integrated with Gymnasium, designed for systematic assessment of adaptive reinforcement learning algorithms under non-stationarity. Contribution/Results: NS-Gym establishes (1) a standardized NS-MDP interface and a curated benchmark suite; (2) a modular architecture that decouples environment dynamics—including abrupt changes, gradual drifts, and periodic variations—from agent policies, enabling reproducible and scalable adaptivity evaluation; and (3) a comprehensive empirical analysis of six state-of-the-art NS-MDP algorithms across diverse non-stationary regimes. By unifying evaluation protocols, NS-Gym significantly enhances the rigor, comparability, and reproducibility of non-stationary RL research, providing foundational infrastructure for robust, adaptive decision-making.
📝 Abstract
In many real-world applications, agents must make sequential decisions in environments where conditions are subject to change due to various exogenous factors. These non-stationary environments pose significant challenges to traditional decision-making models, which typically assume stationary dynamics. Non-stationary Markov decision processes (NS-MDPs) offer a framework to model and solve decision problems under such changing conditions. However, the lack of standardized benchmarks and simulation tools has hindered systematic evaluation and advance in this field. We present NS-Gym, the first simulation toolkit designed explicitly for NS-MDPs, integrated within the popular Gymnasium framework. In NS-Gym, we segregate the evolution of the environmental parameters that characterize non-stationarity from the agent's decision-making module, allowing for modular and flexible adaptations to dynamic environments. We review prior work in this domain and present a toolkit encapsulating key problem characteristics and types in NS-MDPs. This toolkit is the first effort to develop a set of standardized interfaces and benchmark problems to enable consistent and reproducible evaluation of algorithms under non-stationary conditions. We also benchmark six algorithmic approaches from prior work on NS-MDPs using NS-Gym. Our vision is that NS-Gym will enable researchers to assess the adaptability and robustness of their decision-making algorithms to non-stationary conditions.