🤖 AI Summary
This work addresses the poor generalization and zero-shot transferability of task-specific policies in humanoid robot control. We propose the first promptable Behavior Foundation Model (BFM), enabling unified execution of diverse tasks—including motion tracking, goal reaching, and reward optimization—on a real-world Unitree G1 robot. Methodologically, we employ unsupervised reinforcement learning to construct a task-shared latent space representation, integrating forward-backward dynamics modeling, critical reward shaping, domain randomization, and history-dependent asymmetric learning. A latent-space-driven multi-policy inference mechanism is further designed to support flexible task adaptation. Experiments demonstrate robust whole-body control in both simulation and physical deployment, with strong zero-shot and few-shot adaptability. Ablation studies confirm the essential contribution of each component to cross-task generalization.
📝 Abstract
Building Behavioral Foundation Models (BFMs) for humanoid robots has the potential to unify diverse control tasks under a single, promptable generalist policy. However, existing approaches are either exclusively deployed on simulated humanoid characters, or specialized to specific tasks such as tracking. We propose BFM-Zero, a framework that learns an effective shared latent representation that embeds motions, goals, and rewards into a common space, enabling a single policy to be prompted for multiple downstream tasks without retraining. This well-structured latent space in BFM-Zero enables versatile and robust whole-body skills on a Unitree G1 humanoid in the real world, via diverse inference methods, including zero-shot motion tracking, goal reaching, and reward optimization, and few-shot optimization-based adaptation. Unlike prior on-policy reinforcement learning (RL) frameworks, BFM-Zero builds upon recent advancements in unsupervised RL and Forward-Backward (FB) models, which offer an objective-centric, explainable, and smooth latent representation of whole-body motions. We further extend BFM-Zero with critical reward shaping, domain randomization, and history-dependent asymmetric learning to bridge the sim-to-real gap. Those key design choices are quantitatively ablated in simulation. A first-of-its-kind model, BFM-Zero establishes a step toward scalable, promptable behavioral foundation models for whole-body humanoid control.