🤖 AI Summary
Existing approaches to whole-body manipulation for humanoid robots rely on teleoperation or vision-based sim-to-real reinforcement learning, which suffer from limited generalization due to hardware complexity and challenges in reward design. This work proposes HuMI, a novel framework that, for the first time, enables portable, robot-free capture of human full-body motion. By integrating hierarchical imitation learning with motion retargeting, HuMI efficiently maps human demonstrations to feasible robot skills. The method substantially improves data efficiency and environmental adaptability, achieving a threefold increase in data collection efficiency and a 70% success rate in unseen environments across five tasks: kneeling, squatting, throwing, walking, and bimanual manipulation.
📝 Abstract
Current approaches for humanoid whole-body manipulation, primarily relying on teleoperation or visual sim-to-real reinforcement learning, are hindered by hardware logistics and complex reward engineering. Consequently, demonstrated autonomous skills remain limited and are typically restricted to controlled environments. In this paper, we present the Humanoid Manipulation Interface (HuMI), a portable and efficient framework for learning diverse whole-body manipulation tasks across various environments. HuMI enables robot-free data collection by capturing rich whole-body motion using portable hardware. This data drives a hierarchical learning pipeline that translates human motions into dexterous and feasible humanoid skills. Extensive experiments across five whole-body tasks--including kneeling, squatting, tossing, walking, and bimanual manipulation--demonstrate that HuMI achieves a 3x increase in data collection efficiency compared to teleoperation and attains a 70% success rate in unseen environments.