Learning Versatile Humanoid Manipulation with Touch Dreaming

📅 2026-04-14
📈 Citations: 0
Influential: 0
📄 PDF

career value

203K/year
🤖 AI Summary
This work addresses the challenge of achieving stable, dexterous, and contact-aware whole-body manipulation for humanoid robots under frequent and varying contact conditions. The authors propose a tactile-centric policy learning framework that integrates reinforcement learning to ensure lower-body and torso stability, leverages VR-based teleoperation with human-to-humanoid motion retargeting for efficient collection of real-world demonstration data, and introduces a multimodal Humanoid Transformer architecture featuring a “tactile dreaming” mechanism. This mechanism jointly predicts future actions and tactile signals in a latent space, treating touch as a core modality during policy learning—a first in the field. Evaluated on five highly contact-intensive tasks, the approach achieves an average success rate 90.9% higher than strong baselines, with latent-space tactile prediction yielding a 30% relative improvement over using raw tactile signals.

Technology Category

Application Category

📝 Abstract
Humanoid robots promise general-purpose assistance, yet real-world humanoid loco-manipulation remains challenging because it requires whole-body stability, dexterous hands, and contact-aware perception under frequent contact changes. In this work, we study dexterous, contact-rich humanoid loco-manipulation. We first develop an RL-based whole-body controller that provides stable lower-body and torso execution during complex manipulation. Built on this controller, we develop a whole-body humanoid data collection system that combines VR-based teleoperation with human-to-humanoid motion mapping, enabling efficient collection of real-world demonstrations. We then propose Humanoid Transformer with Touch Dreaming (HTD), a multimodal encoder--decoder Transformer that models touch as a core modality alongside multi-view vision and proprioception. HTD is trained in a single stage with behavioral cloning augmented by touch dreaming: in addition to predicting action chunks, the policy predicts future hand-joint forces and future tactile latents, encouraging the shared Transformer trunk to learn contact-aware representations for dexterous interaction. Across five contact-rich tasks, Insert-T, Book Organization, Towel Folding, Cat Litter Scooping, and Tea Serving, HTD achieves a 90.9% relative improvement in average success rate over the stronger baseline. Ablation results further show that latent-space tactile prediction is more effective than raw tactile prediction, yielding a 30% relative gain in success rate. These results demonstrate that combining robust whole-body execution, scalable humanoid data collection, and predictive touch-centered learning enables versatile, high-dexterity humanoid manipulation in the real world. Project webpage: humanoid-touch-dream.github.io.
Problem

Research questions and friction points this paper is trying to address.

humanoid manipulation
dexterous manipulation
contact-rich interaction
whole-body control
tactile perception
Innovation

Methods, ideas, or system contributions that make the work stand out.

Touch Dreaming
Humanoid Manipulation
Whole-body Control
Multimodal Transformer
Tactile Prediction
🔎 Similar Papers
2024-07-16Neural Information Processing SystemsCitations: 16