🤖 AI Summary
Data scarcity severely hinders the development of dexterous-hand embodied intelligence, particularly due to the lack of scalable, trainable methods for generating large-scale manipulation tasks. To address this, we propose the first generative simulation framework tailored for high-DoF dexterous hands. Our method leverages vision-language models (VLMs) to drive closed-loop feedback optimization—dynamically adjusting object placement and scale to enhance scene realism—and incorporates subtask decomposition to enable sequential reinforcement learning training. Furthermore, VLMs perform semantic quality assessment and iterative refinement of generated tasks. This approach significantly improves simulation diversity and task plausibility, yielding a 42% increase in training efficiency and a 31% improvement in task success rate. To our knowledge, this is the first scalable, high-fidelity, semantically controllable paradigm for generating dexterous manipulation simulation data for embodied intelligence.
📝 Abstract
Data scarcity remains a fundamental bottleneck for embodied intelligence. Existing approaches use large language models (LLMs) to automate gripper-based simulation generation, but they transfer poorly to dexterous manipulation, which demands more specialized environment design. Meanwhile, dexterous manipulation tasks are inherently more difficult due to their higher degrees of freedom. Massively generating feasible and trainable dexterous hand tasks remains an open challenge. To this end, we present GenDexHand, a generative simulation pipeline that autonomously produces diverse robotic tasks and environments for dexterous manipulation. GenDexHand introduces a closed-loop refinement process that adjusts object placements and scales based on vision-language model (VLM) feedback, substantially improving the average quality of generated environments. Each task is further decomposed into sub-tasks to enable sequential reinforcement learning, reducing training time and increasing success rates. Our work provides a viable path toward scalable training of diverse dexterous hand behaviors in embodied intelligence by offering a simulation-based solution to synthetic data generation. Our website: https://winniechen2002.github.io/GenDexHand/.