🤖 AI Summary
To address the generalization bottleneck in dexterous manipulation caused by scarce real-world human demonstration data, this paper proposes the first purely synthetic-data-driven hand-object tracking and control framework. Methodologically, we design a Hand-Object Planner (HOP) to generate diverse, physically plausible trajectories, and integrate reinforcement learning with interactive imitation learning to build a state-conditioned Hand-Object Tracker (HOT), enabling co-optimization of HOP and HOT. Our contributions are threefold: (1) complete elimination of real demonstrations—long-horizon, complex tasks (e.g., hand retargeting and object rearrangement) are learned solely from synthetic data; (2) zero-shot transfer across unseen object geometries and multi-configurational dexterous hands; and (3) strong cross-domain generalization empirically validated on multiple sim-to-real hand-object systems, significantly alleviating data dependency constraints.
📝 Abstract
We present a system for learning generalizable hand-object tracking controllers purely from synthetic data, without requiring any human demonstrations. Our approach makes two key contributions: (1) HOP, a Hand-Object Planner, which can synthesize diverse hand-object trajectories; and (2) HOT, a Hand-Object Tracker that bridges synthetic-to-physical transfer through reinforcement learning and interaction imitation learning, delivering a generalizable controller conditioned on target hand-object states. Our method extends to diverse object shapes and hand morphologies. Through extensive evaluations, we show that our approach enables dexterous hands to track challenging, long-horizon sequences including object re-arrangement and agile in-hand reorientation. These results represent a significant step toward scalable foundation controllers for manipulation that can learn entirely from synthetic data, breaking the data bottleneck that has long constrained progress in dexterous manipulation.