Ego-Vision World Model for Humanoid Contact Planning

📅 2025-10-13

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

This work addresses the challenge of enabling humanoid robots to actively exploit physical contact for enhanced autonomy in unstructured environments. We propose an offline reinforcement learning framework that integrates a learned world model with sample-based model predictive control (MPC). The method takes proprioceptive and egocentric depth imagery as input and is trained end-to-end on demonstration-free offline data—without requiring policy pretraining. To mitigate sparse contact rewards and sensor noise, it employs latent-space future prediction and a learned surrogate value function. Technically, it unifies compressed latent sequence modeling, dense value estimation, and real-time MPC optimization. Evaluated on a physical humanoid robot, the approach successfully accomplishes contact-intensive tasks—including wall-supported standing, disturbance rejection, and obstacle navigation—outperforming online RL baselines. It achieves breakthrough improvements in data efficiency and multi-task generalization.

Technology Category

Application Category

📝 Abstract

Enabling humanoid robots to exploit physical contact, rather than simply avoid collisions, is crucial for autonomy in unstructured environments. Traditional optimization-based planners struggle with contact complexity, while on-policy reinforcement learning (RL) is sample-inefficient and has limited multi-task ability. We propose a framework combining a learned world model with sampling-based Model Predictive Control (MPC), trained on a demonstration-free offline dataset to predict future outcomes in a compressed latent space. To address sparse contact rewards and sensor noise, the MPC uses a learned surrogate value function for dense, robust planning. Our single, scalable model supports contact-aware tasks, including wall support after perturbation, blocking incoming objects, and traversing height-limited arches, with improved data efficiency and multi-task capability over on-policy RL. Deployed on a physical humanoid, our system achieves robust, real-time contact planning from proprioception and ego-centric depth images. Website: https://ego-vcp.github.io/

Problem

Research questions and friction points this paper is trying to address.

Enabling humanoid robots to utilize physical contact for autonomy

Overcoming sample inefficiency in reinforcement learning for contact planning

Developing robust real-time contact planning from proprioception and depth

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learned world model with sampling-based MPC

Offline dataset training in compressed latent space

Learned surrogate value function for contact planning

🔎 Similar Papers

iWalker: Imperative Visual Planning for Walking Humanoid Robot