🤖 AI Summary
Existing whole-body control (WBC) frameworks suffer from strong task specificity, reliance on hand-crafted reward functions, and poor generalization—hindering adaptation to complex real-world scenarios and diverse control modalities. To address this, we propose the Behavior Foundation Model (BFM), a unified WBC framework for humanoid robots, pretrained on large-scale heterogeneous behavioral data. BFM employs a conditional variational autoencoder (CVAE) to model the distribution of human-robot behaviors and introduces masked online distillation to efficiently extract reusable behavioral knowledge. This enables zero-shot transfer and rapid adaptation to novel skills without retraining. Evaluated in both simulation and on physical humanoid platforms, BFM demonstrates substantial improvements in cross-task generalization, multimodal control compatibility, and environmental adaptability. Our approach establishes a new paradigm for generalist embodied intelligence in robot control.
📝 Abstract
Whole-body control (WBC) of humanoid robots has witnessed remarkable progress in skill versatility, enabling a wide range of applications such as locomotion, teleoperation, and motion tracking. Despite these achievements, existing WBC frameworks remain largely task-specific, relying heavily on labor-intensive reward engineering and demonstrating limited generalization across tasks and skills. These limitations hinder their response to arbitrary control modes and restrict their deployment in complex, real-world scenarios. To address these challenges, we revisit existing WBC systems and identify a shared objective across diverse tasks: the generation of appropriate behaviors that guide the robot toward desired goal states. Building on this insight, we propose the Behavior Foundation Model (BFM), a generative model pretrained on large-scale behavioral datasets to capture broad, reusable behavioral knowledge for humanoid robots. BFM integrates a masked online distillation framework with a Conditional Variational Autoencoder (CVAE) to model behavioral distributions, thereby enabling flexible operation across diverse control modes and efficient acquisition of novel behaviors without retraining from scratch. Extensive experiments in both simulation and on a physical humanoid platform demonstrate that BFM generalizes robustly across diverse WBC tasks while rapidly adapting to new behaviors. These results establish BFM as a promising step toward a foundation model for general-purpose humanoid control.