Meta-World+: An Improved, Standardized, RL Benchmark

📅 2025-05-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Meta-World, a widely adopted benchmark for multitask and meta-reinforcement learning, has suffered from inconsistent versioning and outdated documentation, undermining result reproducibility and cross-algorithm comparability. To address this, we systematically reconstruct and standardize the benchmark: (1) we achieve full reproducibility of all historical results through deterministic environment initialization, modular task definitions, and rigorous CI/CD-based validation; (2) we unify the API interface and task configuration paradigm, enabling fine-grained, customizable task suite composition; and (3) we release an open-source, Gym-compatible implementation built on Python, with explicit random seed control and modular architecture. The new version is publicly available as Farama-Foundation/Metaworld. This work advances benchmark design toward scientific rigor, substantially improving experimental reproducibility, cross-study comparability, and research efficiency—establishing foundational infrastructure for fair and reliable reinforcement learning evaluation.

Technology Category

Application Category

📝 Abstract
Meta-World is widely used for evaluating multi-task and meta-reinforcement learning agents, which are challenged to master diverse skills simultaneously. Since its introduction however, there have been numerous undocumented changes which inhibit a fair comparison of algorithms. This work strives to disambiguate these results from the literature, while also leveraging the past versions of Meta-World to provide insights into multi-task and meta-reinforcement learning benchmark design. Through this process we release a new open-source version of Meta-World (https://github.com/Farama-Foundation/Metaworld/) that has full reproducibility of past results, is more technically ergonomic, and gives users more control over the tasks that are included in a task set.
Problem

Research questions and friction points this paper is trying to address.

Resolves inconsistencies in Meta-World benchmark for fair algorithm comparisons
Improves reproducibility and usability of multi-task RL evaluation
Enhances task customization in Meta-World+ for better benchmark design
Innovation

Methods, ideas, or system contributions that make the work stand out.

Standardized RL benchmark for fair comparisons
Open-source version with full reproducibility
Enhanced user control over task sets
🔎 Similar Papers
No similar papers found.