Grasp as You Dream: Imitating Functional Grasping from Generated Human Demonstrations

📅 2026-04-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of enabling robots to achieve general-purpose functional grasping in open-world settings, which typically demands extensive real-world interaction data. To overcome this limitation, the authors propose a zero-shot approach that leverages pretrained visual generative models—such as video generation models—to synthesize human manipulation demonstrations. From these synthetic demonstrations, implicit physical interaction priors are extracted and integrated with embodied action optimization to learn effective grasping policies. Notably, the method requires no real interaction data yet generalizes robustly across diverse objects and tasks. It significantly outperforms existing approaches on multiple public benchmarks, demonstrating high data efficiency and strong generalization capability, and its effectiveness is further validated on a real robotic platform.
📝 Abstract
Building generalist robots capable of performing functional grasping in everyday, open-world environments remains a significant challenge due to the vast diversity of objects and tasks. Existing methods are either constrained to narrow object/task sets or rely on prohibitively large-scale data collection to capture real-world variability. In this work, we present an alternative approach, GraspDreamer, a method that leverages human demonstrations synthesized by visual generative models (VGMs) (e.g., video generation models) to enable zero-shot functional grasping without labor-intensive data collection. The key idea is that VGMs pre-trained on internet-scale human data implicitly encode generalized priors about how humans interact with the physical world, which can be combined with embodiment-specific action optimization to enable functional grasping with minimal effort. Extensive experiments on the public benchmarks with different robot hands demonstrate the superior data efficiency and generalization performance of GraspDreamer compared to previous methods. Real-world evaluations further validate the effectiveness on real robots. Additionally, we showcase that GraspDreamer can (1) be naturally extended to downstream manipulation tasks, and (2) can generate data to support visuomotor policy learning.
Problem

Research questions and friction points this paper is trying to address.

functional grasping
generalist robots
open-world environments
object diversity
task variability
Innovation

Methods, ideas, or system contributions that make the work stand out.

functional grasping
visual generative models
zero-shot learning
human demonstration synthesis
robotic manipulation
🔎 Similar Papers
No similar papers found.