Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models

πŸ“… 2025-04-15
πŸ“ˆ Citations: 1
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing unsupervised reinforcement learning (URL) methods face three key bottlenecks in complex downstream tasks: task-specific fine-tuning, reliance on task-relevant data, or weakly aligned pretraining objectives. This paper introduces Meta Motivoβ€”the first foundation model for zero-shot embodied intelligence in full-body humanoid robots. Its core is a forward-backward representation learning framework coupled with conditional policy regularization, which uniquely aligns URL with behavioral priors extracted from unlabeled motion capture (mocap) data, enabling reward-driven and imitation-based zero-shot generalization. The method integrates contrastive representation learning, latent-space conditional discriminators, trajectory embedding, and meta-policy modeling, trained end-to-end solely on observation-only mocap data. Experiments demonstrate that Meta Motivo matches specialized methods on motion tracking, goal reaching, and reward optimization tasks, while significantly outperforming state-of-the-art URL and model-based baselines.

Technology Category

Application Category

πŸ“ Abstract
Unsupervised reinforcement learning (RL) aims at pre-training agents that can solve a wide range of downstream tasks in complex environments. Despite recent advancements, existing approaches suffer from several limitations: they may require running an RL process on each downstream task to achieve a satisfactory performance, they may need access to datasets with good coverage or well-curated task-specific samples, or they may pre-train policies with unsupervised losses that are poorly correlated with the downstream tasks of interest. In this paper, we introduce a novel algorithm regularizing unsupervised RL towards imitating trajectories from unlabeled behavior datasets. The key technical novelty of our method, called Forward-Backward Representations with Conditional-Policy Regularization, is to train forward-backward representations to embed the unlabeled trajectories to the same latent space used to represent states, rewards, and policies, and use a latent-conditional discriminator to encourage policies to ``cover'' the states in the unlabeled behavior dataset. As a result, we can learn policies that are well aligned with the behaviors in the dataset, while retaining zero-shot generalization capabilities for reward-based and imitation tasks. We demonstrate the effectiveness of this new approach in a challenging humanoid control problem: leveraging observation-only motion capture datasets, we train Meta Motivo, the first humanoid behavioral foundation model that can be prompted to solve a variety of whole-body tasks, including motion tracking, goal reaching, and reward optimization. The resulting model is capable of expressing human-like behaviors and it achieves competitive performance with task-specific methods while outperforming state-of-the-art unsupervised RL and model-based baselines.
Problem

Research questions and friction points this paper is trying to address.

Develop zero-shot humanoid control via behavioral foundation models
Improve unsupervised RL alignment with unlabeled behavior datasets
Enable human-like task solving without task-specific training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Forward-Backward Representations with Conditional-Policy Regularization
Latent-conditional discriminator for policy alignment
Zero-shot generalization for reward and imitation tasks
πŸ”Ž Similar Papers