MO-Playground: Massively Parallelized Multi-Objective Reinforcement Learning for Robotics

📅 2026-03-10

📈 Citations: 0

✨ Influential: 0

career value

260K/year

🤖 AI Summary

Existing methods struggle to efficiently harness large-scale parallel computation, resulting in poor training efficiency for multi-objective reinforcement learning (MORL) in complex robotic tasks. This work proposes MORLAX, a novel algorithm, alongside MO-Playground, an environment platform that together enable the first GPU-native, massively parallel MORL framework. The system supports synchronous simulation across thousands of environments, dramatically accelerating convergence to the Pareto front. Evaluated on the BRUCE humanoid robot, the approach successfully learns Pareto-optimal locomotion policies spanning six realistic objectives. Compared to conventional CPU-based methods, it achieves speedups of 25–270× while yielding Pareto fronts with superior hypervolume, demonstrating both computational efficiency and enhanced solution quality.

Technology Category

Application Category

📝 Abstract

Multi-objective reinforcement learning (MORL) is a powerful tool to learn Pareto-optimal policy families across conflicting objectives. However, unlike traditional RL algorithms, existing MORL algorithms do not effectively leverage large-scale parallelization to concurrently simulate thousands of environments, resulting in vastly increased computation time. Ultimately, this has limited MORL's application towards complex multi-objective robotics problems. To address these challenges, we present 1) MORLAX, a new GPU-native, fast MORL algorithm, and 2) MO-Playground, a pip-installable playground of GPU-accelerated multi-objective environments. Together, MORLAX and MO-Playground approximate Pareto sets within minutes, offering 25-270x speed-ups compared to legacy CPU-based approaches whilst achieving superior Pareto front hypervolumes. We demonstrate the versatility of our approach by implementing a custom BRUCE humanoid robot environment using MO-Playground and learning Pareto-optimal locomotion policies across 6 realistic objectives for BRUCE, such as smoothness, efficiency and arm swinging.

Problem

Research questions and friction points this paper is trying to address.

multi-objective reinforcement learning

massively parallelized simulation

robotics

Pareto-optimal policies

computation efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-objective reinforcement learning

GPU acceleration

Massively parallelized simulation