Humanoid Policy ~ Human Policy

📅 2025-03-17

📈 Citations: 0

✨ Influential: 0

career value

243K/year

🤖 AI Summary

This study addresses the heavy reliance of humanoid robots on costly, scarce robot demonstration data, aiming to enhance cross-task and cross-platform generalization and robustness. To this end, we propose Human Action Transformer (HAT), an unsupervised cross-modal modeling framework leveraging first-person human manipulation videos from the PH2D dataset. HAT introduces the first task-aligned egocentric human–robot co-located dataset (PH2D) and employs a Transformer architecture to learn unified state-action representations. Crucially, it incorporates differentiable motion retargeting to jointly model human and humanoid robot actions without manual annotations. By integrating human videos with a small number of robot demonstrations in a multi-source collaborative training paradigm, HAT significantly improves policy generalization. Data acquisition efficiency increases by over an order of magnitude compared to pure robot demonstration-based methods, enabling zero-shot task transfer.

Technology Category

Application Category

📝 Abstract

Training manipulation policies for humanoid robots with diverse data enhances their robustness and generalization across tasks and platforms. However, learning solely from robot demonstrations is labor-intensive, requiring expensive tele-operated data collection which is difficult to scale. This paper investigates a more scalable data source, egocentric human demonstrations, to serve as cross-embodiment training data for robot learning. We mitigate the embodiment gap between humanoids and humans from both the data and modeling perspectives. We collect an egocentric task-oriented dataset (PH2D) that is directly aligned with humanoid manipulation demonstrations. We then train a human-humanoid behavior policy, which we term Human Action Transformer (HAT). The state-action space of HAT is unified for both humans and humanoid robots and can be differentiably retargeted to robot actions. Co-trained with smaller-scale robot data, HAT directly models humanoid robots and humans as different embodiments without additional supervision. We show that human data improves both generalization and robustness of HAT with significantly better data collection efficiency. Code and data: https://human-as-robot.github.io/

Problem

Research questions and friction points this paper is trying to address.

Training humanoid robots using diverse data for better robustness and generalization.

Using human demonstrations to reduce labor-intensive robot data collection.

Mitigating the embodiment gap between humans and humanoid robots.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses egocentric human demonstrations for robot training

Introduces Human Action Transformer (HAT) model

Unifies state-action space for humans and humanoids

🔎 Similar Papers

AI Robots and Humanoid AI: Review, Perspectives and Directions