Robust Gymnasium: A Unified Modular Benchmark for Robust Reinforcement Learning

📅 2025-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current robust reinforcement learning (Robust RL) lacks standardized, systematic benchmarks; existing evaluations are typically confined to isolated environments and single perturbation types, hindering algorithmic comparability and generalization. Method: We introduce the first unified, modular benchmark for Robust RL—covering state, reward, action, and environment perturbations—implemented across 60+ reproducible tasks in control, robotics, safety-critical domains, and multi-agent settings. Built on the Gymnasium ecosystem, it features a novel, fully composable perturbation modeling framework supporting stochastic/adversarial perturbations, delays, noise injection, and customizable perturbation combinations and evaluation protocols. Contribution/Results: Extensive experiments expose widespread fragility of mainstream RL and Robust RL algorithms under cross-perturbation generalization, establishing a new quantitative standard for robustness evaluation and enabling principled, large-scale assessment of algorithmic resilience.

Technology Category

Application Category

📝 Abstract
Driven by inherent uncertainty and the sim-to-real gap, robust reinforcement learning (RL) seeks to improve resilience against the complexity and variability in agent-environment sequential interactions. Despite the existence of a large number of RL benchmarks, there is a lack of standardized benchmarks for robust RL. Current robust RL policies often focus on a specific type of uncertainty and are evaluated in distinct, one-off environments. In this work, we introduce Robust-Gymnasium, a unified modular benchmark designed for robust RL that supports a wide variety of disruptions across all key RL components-agents' observed state and reward, agents' actions, and the environment. Offering over sixty diverse task environments spanning control and robotics, safe RL, and multi-agent RL, it provides an open-source and user-friendly tool for the community to assess current methods and foster the development of robust RL algorithms. In addition, we benchmark existing standard and robust RL algorithms within this framework, uncovering significant deficiencies in each and offering new insights.
Problem

Research questions and friction points this paper is trying to address.

Standardized benchmarks for robust RL
Unified modular evaluation framework
Assessing resilience in diverse RL disruptions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified modular robust RL benchmark
Supports diverse disruptions comprehensively
Open-source tool for algorithm assessment
🔎 Similar Papers
No similar papers found.