Humanoid World Models: Open World Foundation Models for Humanoid Robotics

📅 2025-06-01

📈 Citations: 0

✨ Influential: 0

career value

268K/year

🤖 AI Summary

To address the lack of lightweight, open-source, and high-fidelity video prediction models for humanoid robots operating in human-centered environments, this paper introduces the first open-source world model architecture specifically designed for humanoid robotics. Methodologically, it integrates Masked Transformers, Flow Matching-based generative modeling, multi-variant attention, and action conditioning, coupled with an efficient parameter-sharing strategy that reduces model size by 33–53% without compromising visual fidelity. Trained on 100 hours of real-world egocentric demonstration data collected from humanoid robots, the model supports both single-frame and multi-frame action-conditioned video prediction. It enables efficient training and deployment on just 1–2 GPUs, significantly enhancing action reasoning and long-horizon planning capabilities in open-world settings.

Technology Category

Application Category

📝 Abstract

Humanoid robots have the potential to perform complex tasks in human centered environments but require robust predictive models to reason about the outcomes of their actions. We introduce Humanoid World Models (HWM) a family of lightweight open source video based models that forecast future egocentric observations conditioned on actions. We train two types of generative models Masked Transformers and FlowMatching on 100 hours of humanoid demonstrations. Additionally we explore architectural variants with different attention mechanisms and parameter sharing strategies. Our parameter sharing techniques reduce model size by 33 to 53 with minimal impact on performance or visual fidelity. HWM is designed to be trained and deployed in practical academic and small lab settings such as 1 to 2 GPUs.

Problem

Research questions and friction points this paper is trying to address.

Develop lightweight models for humanoid robot action prediction

Train generative models on humanoid demonstration data

Reduce model size without sacrificing performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight open-source video-based models

Masked Transformers and FlowMatching generative models

Parameter sharing reduces model size significantly

🔎 Similar Papers

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey