🤖 AI Summary
Visual motor policies often require extensive human demonstrations and exhibit poor spatial generalization. To address this, we propose a novel method that generates high-fidelity synthetic demonstrations from only a single real-world demonstration. Our approach introduces geometry-consistent data synthesis grounded in 3D point cloud editing, integrated with action trajectory space adaptation, scene-level 3D editing, and geometry-aware action transfer—enabling robust manipulation of deformable objects, dexterous hands, and bimanual platforms. The method significantly improves policy performance across diverse real-world tasks while enhancing out-of-distribution robustness, including interference resistance and obstacle avoidance. Experiments demonstrate that, using just one real demonstration per task, our method achieves generalization performance comparable to conventional approaches relying on dozens to hundreds of manually collected demonstrations.
📝 Abstract
Visuomotor policies have shown great promise in robotic manipulation but often require substantial amounts of human-collected data for effective performance. A key reason underlying the data demands is their limited spatial generalization capability, which necessitates extensive data collection across different object configurations. In this work, we present DemoGen, a low-cost, fully synthetic approach for automatic demonstration generation. Using only one human-collected demonstration per task, DemoGen generates spatially augmented demonstrations by adapting the demonstrated action trajectory to novel object configurations. Visual observations are synthesized by leveraging 3D point clouds as the modality and rearranging the subjects in the scene via 3D editing. Empirically, DemoGen significantly enhances policy performance across a diverse range of real-world manipulation tasks, showing its applicability even in challenging scenarios involving deformable objects, dexterous hand end-effectors, and bimanual platforms. Furthermore, DemoGen can be extended to enable additional out-of-distribution capabilities, including disturbance resistance and obstacle avoidance.