🤖 AI Summary
This work addresses the bottleneck in motion generation—its reliance on large-scale, task-specific motion capture (MoCap) datasets—by proposing a novel method that synthesizes diverse, physically plausible full-body reach-to-grasp motions using only a few seconds of walking MoCap data. Methodologically, it transfers implicit general locomotion priors from walking data to the grasping task for the first time; introduces an active data generation scheme coupled with local temporal feature alignment to jointly ensure motion naturalness and task success; and integrates kinematics-driven grasp pose initialization, physics-based simulation optimization, and data augmentation. Experiments demonstrate high grasp success rates and motion naturalness across diverse scenes and unseen objects, significantly outperforming existing approaches that require complete task-specific MoCap data.
📝 Abstract
Existing motion generation methods based on mocap data are often limited by data quality and coverage. In this work, we propose a framework that generates diverse, physically feasible full-body human reaching and grasping motions using only brief walking mocap data. Base on the observation that walking data captures valuable movement patterns transferable across tasks and, on the other hand, the advanced kinematic methods can generate diverse grasping poses, which can then be interpolated into motions to serve as task-specific guidance. Our approach incorporates an active data generation strategy to maximize the utility of the generated motions, along with a local feature alignment mechanism that transfers natural movement patterns from walking data to enhance both the success rate and naturalness of the synthesized motions. By combining the fidelity and stability of natural walking with the flexibility and generalizability of task-specific generated data, our method demonstrates strong performance and robust adaptability in diverse scenes and with unseen objects.