Meta-World+: An Improved, Standardized, RL Benchmark

📅 2025-05-16

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Meta-World, a widely adopted benchmark for multitask and meta-reinforcement learning, has suffered from inconsistent versioning and outdated documentation, undermining result reproducibility and cross-algorithm comparability. To address this, we systematically reconstruct and standardize the benchmark: (1) we achieve full reproducibility of all historical results through deterministic environment initialization, modular task definitions, and rigorous CI/CD-based validation; (2) we unify the API interface and task configuration paradigm, enabling fine-grained, customizable task suite composition; and (3) we release an open-source, Gym-compatible implementation built on Python, with explicit random seed control and modular architecture. The new version is publicly available as Farama-Foundation/Metaworld. This work advances benchmark design toward scientific rigor, substantially improving experimental reproducibility, cross-study comparability, and research efficiency—establishing foundational infrastructure for fair and reliable reinforcement learning evaluation.

Technology Category

Application Category

📝 Abstract

Meta-World is widely used for evaluating multi-task and meta-reinforcement learning agents, which are challenged to master diverse skills simultaneously. Since its introduction however, there have been numerous undocumented changes which inhibit a fair comparison of algorithms. This work strives to disambiguate these results from the literature, while also leveraging the past versions of Meta-World to provide insights into multi-task and meta-reinforcement learning benchmark design. Through this process we release a new open-source version of Meta-World (https://github.com/Farama-Foundation/Metaworld/) that has full reproducibility of past results, is more technically ergonomic, and gives users more control over the tasks that are included in a task set.

Problem

Research questions and friction points this paper is trying to address.

Resolves inconsistencies in Meta-World benchmark for fair algorithm comparisons

Improves reproducibility and usability of multi-task RL evaluation

Enhances task customization in Meta-World+ for better benchmark design

Innovation

Methods, ideas, or system contributions that make the work stand out.

Standardized RL benchmark for fair comparisons

Open-source version with full reproducibility

Enhanced user control over task sets

🔎 Similar Papers

No similar papers found.