Hierarchical World Models as Visual Whole-Body Humanoid Controllers

📅 2024-05-28

🏛️ International Conference on Learning Representations

📈 Citations: 10

✨ Influential: 0

career value

249K/year

🤖 AI Summary

This work addresses core challenges in whole-body control of humanoid robots—high-dimensional action spaces, bipedal instability, and difficulties in end-to-end visual learning—by proposing a hierarchical world model architecture that requires no handcrafted rewards, simplifying assumptions, or skill priors. The architecture decouples high-level visual decision-making from low-level motor execution and jointly optimizes both via reinforcement learning. Evaluated on a 56-DoF simulated humanoid platform (Isaac Gym), the system achieves multi-task generalization using raw visual inputs only. It attains high-performance policies across eight complex tasks, with motion quality rated significantly superior to existing baselines by human evaluators. To our knowledge, this is the first demonstration of end-to-end, multi-task, generalizable whole-body control for high-DOF humanoid robots driven solely by vision.

Technology Category

Application Category

📝 Abstract

Whole-body control for humanoids is challenging due to the high-dimensional nature of the problem, coupled with the inherent instability of a bipedal morphology. Learning from visual observations further exacerbates this difficulty. In this work, we explore highly data-driven approaches to visual whole-body humanoid control based on reinforcement learning, without any simplifying assumptions, reward design, or skill primitives. Specifically, we propose a hierarchical world model in which a high-level agent generates commands based on visual observations for a low-level agent to execute, both of which are trained with rewards. Our approach produces highly performant control policies in 8 tasks with a simulated 56-DoF humanoid, while synthesizing motions that are broadly preferred by humans.

Problem

Research questions and friction points this paper is trying to address.

High-dimensional whole-body control for humanoids

Learning from visual observations is challenging

Hierarchical world model for performant control policies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical world model for humanoid control

High-level agent generates visual-based commands

Low-level agent executes commands with rewards

🔎 Similar Papers

Self Model for Embodied Intelligence: Modeling Full-Body Human Musculoskeletal System and Locomotion Control with Hierarchical Low-Dimensional Representation