EgoScale: Scaling Dexterous Manipulation with Diverse Egocentric Human Data

📅 2026-02-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of transferring high-DoF dexterous manipulation skills to robots by leveraging large-scale human behavioral data. Building upon 20,854 hours of egocentric human action videos, the authors develop a vision–language–action (VLA) model and introduce a lightweight two-stage transfer strategy: pretraining on human data followed by fine-tuning with a small amount of aligned human–robot demonstration data. They report the first empirical observation of a log-linear scaling law between human data volume and validation loss, which strongly correlates with real-world robotic performance. Evaluated on a 22-DoF dexterous hand, the approach achieves a 54% average improvement in task success rate over non-pretrained baselines and successfully generalizes to lower-DoF robotic hands, demonstrating that large-scale human data can serve as a reusable, embodiment-agnostic prior for robotic motor skills.

Technology Category

Application Category

📝 Abstract
Human behavior is among the most scalable sources of data for learning physical intelligence, yet how to effectively leverage it for dexterous manipulation remains unclear. While prior work demonstrates human to robot transfer in constrained settings, it is unclear whether large scale human data can support fine grained, high degree of freedom dexterous manipulation. We present EgoScale, a human to dexterous manipulation transfer framework built on large scale egocentric human data. We train a Vision Language Action (VLA) model on over 20,854 hours of action labeled egocentric human video, more than 20 times larger than prior efforts, and uncover a log linear scaling law between human data scale and validation loss. This validation loss strongly correlates with downstream real robot performance, establishing large scale human data as a predictable supervision source. Beyond scale, we introduce a simple two stage transfer recipe: large scale human pretraining followed by lightweight aligned human robot mid training. This enables strong long horizon dexterous manipulation and one shot task adaptation with minimal robot supervision. Our final policy improves average success rate by 54% over a no pretraining baseline using a 22 DoF dexterous robotic hand, and transfers effectively to robots with lower DoF hands, indicating that large scale human motion provides a reusable, embodiment agnostic motor prior.
Problem

Research questions and friction points this paper is trying to address.

dexterous manipulation
egocentric human data
human to robot transfer
physical intelligence
scaling
Innovation

Methods, ideas, or system contributions that make the work stand out.

EgoScale
dexterous manipulation
egocentric human data
vision-language-action model
scaling law
🔎 Similar Papers
No similar papers found.