Scaling Tasks, Not Samples: Mastering Humanoid Control through Multi-Task Model-Based Reinforcement Learning

📅 2026-03-02

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work addresses the inefficiency of existing robotic multi-skill learning approaches, which often rely on large-scale data and parameter scaling. Instead, the authors propose scaling the number of tasks rather than per-task sample size, leveraging a shared dynamics model to aggregate multi-task experience for efficient whole-body control policy learning. The study provides the first systematic demonstration that task scaling acts as an effective regularizer in model-based reinforcement learning, revealing its structural advantages in dynamics modeling and sample efficiency while mitigating gradient interference inherent in model-free methods. Building upon EfficientZero, the proposed multi-task algorithm EZ-M achieves online training on HumanoidBench, significantly outperforming strong baselines without requiring extreme parameter scaling, thereby validating the efficacy and scalability of the task-scaling paradigm.

Technology Category

Application Category

📝 Abstract

Developing generalist robots capable of mastering diverse skills remains a central challenge in embodied AI. While recent progress emphasizes scaling model parameters and offline datasets, such approaches are limited in robotics, where learning requires active interaction. We argue that effective online learning should scale the \emph{number of tasks}, rather than the number of samples per task. This regime reveals a structural advantage of model-based reinforcement learning (MBRL). Because physical dynamics are invariant across tasks, a shared world model can aggregate multi-task experience to learn robust, task-agnostic representations. In contrast, model-free methods suffer from gradient interference when tasks demand conflicting actions in similar states. Task diversity therefore acts as a regularizer for MBRL, improving dynamics learning and sample efficiency. We instantiate this idea with \textbf{EfficientZero-Multitask (EZ-M)}, a sample-efficient multi-task MBRL algorithm for online learning. Evaluated on \textbf{HumanoidBench}, a challenging whole-body control benchmark, EZ-M achieves state-of-the-art performance with significantly higher sample efficiency than strong baselines, without extreme parameter scaling. These results establish task scaling as a critical axis for scalable robotic learning. The project website is available \href{https://yewr.github.io/ez_m/}{here}.

Problem

Research questions and friction points this paper is trying to address.

humanoid control

multi-task learning

model-based reinforcement learning

sample efficiency

embodied AI

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-task reinforcement learning

model-based reinforcement learning

task scaling