BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning

📅 2025-11-06

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

This work addresses the poor generalization and zero-shot transferability of task-specific policies in humanoid robot control. We propose the first promptable Behavior Foundation Model (BFM), enabling unified execution of diverse tasks—including motion tracking, goal reaching, and reward optimization—on a real-world Unitree G1 robot. Methodologically, we employ unsupervised reinforcement learning to construct a task-shared latent space representation, integrating forward-backward dynamics modeling, critical reward shaping, domain randomization, and history-dependent asymmetric learning. A latent-space-driven multi-policy inference mechanism is further designed to support flexible task adaptation. Experiments demonstrate robust whole-body control in both simulation and physical deployment, with strong zero-shot and few-shot adaptability. Ablation studies confirm the essential contribution of each component to cross-task generalization.

Technology Category

Application Category

📝 Abstract

Building Behavioral Foundation Models (BFMs) for humanoid robots has the potential to unify diverse control tasks under a single, promptable generalist policy. However, existing approaches are either exclusively deployed on simulated humanoid characters, or specialized to specific tasks such as tracking. We propose BFM-Zero, a framework that learns an effective shared latent representation that embeds motions, goals, and rewards into a common space, enabling a single policy to be prompted for multiple downstream tasks without retraining. This well-structured latent space in BFM-Zero enables versatile and robust whole-body skills on a Unitree G1 humanoid in the real world, via diverse inference methods, including zero-shot motion tracking, goal reaching, and reward optimization, and few-shot optimization-based adaptation. Unlike prior on-policy reinforcement learning (RL) frameworks, BFM-Zero builds upon recent advancements in unsupervised RL and Forward-Backward (FB) models, which offer an objective-centric, explainable, and smooth latent representation of whole-body motions. We further extend BFM-Zero with critical reward shaping, domain randomization, and history-dependent asymmetric learning to bridge the sim-to-real gap. Those key design choices are quantitatively ablated in simulation. A first-of-its-kind model, BFM-Zero establishes a step toward scalable, promptable behavioral foundation models for whole-body humanoid control.

Problem

Research questions and friction points this paper is trying to address.

Develops a unified policy for multiple humanoid control tasks without retraining

Learns shared latent representation for motions, goals, and rewards in common space

Bridges sim-to-real gap for versatile whole-body skills on physical humanoid robots

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised reinforcement learning for humanoid control

Shared latent space embedding motions and goals

Sim-to-real transfer with reward shaping techniques

🔎 Similar Papers

Hierarchical World Models as Visual Whole-Body Humanoid Controllers