🤖 AI Summary
This work addresses the challenge of learning robust bimanual robot policies from a single demonstration, particularly in contact-intensive tasks where systems are prone to perturbations that push them out of feasible states and lack supervisory signals for recovery. To overcome this, the authors propose the PGDG framework, which iteratively combines a physics-constrained sampler and a data filter to automatically generate diverse, successful, and physically plausible recovery trajectories from a single demonstration. PGDG further employs short-horizon sampling-based control to relabel risky states. Notably, it enables zero-shot generation of high-quality, multimodal bimanual recovery data without any human annotation. Evaluated across four tasks, PGDG significantly enhances policy robustness and generalization, improving success rates from 38% to 93% in simulation and from 35% to 82% in the real world, while also boosting the fine-tuning performance of foundation models such as GR00T.
📝 Abstract
Behavior cloning for contact-rich bimanual manipulation remains challenging because diverse demonstrations are expensive to collect, and even small disturbances can push the system into off-manifold states where no recovery supervision is available. We propose PGDG, a data generation framework with zero-shot curation that expands a single demonstration into a compact dataset of physically plausible, successful, and diverse recovery behaviors without additional human labeling. PGDG iterates between a physics-grounded sampler and a dataset curator, where the curator selects informative, non-redundant, and recoverable behaviors to update the sampling distribution toward under-covered recovery modes, and the sampler draws physically plausible rollout candidates from this updated distribution and retains successful trajectories. To further improve data quality, PGDG applies short-horizon sampling-based control to relabel selected risky states with corrective actions. Across four bimanual manipulation tasks, PGDG consistently outperforms spatial-only augmentation in both simulation and zero-shot real-world transfer. On RotateBox-Pitch, success improves from 38% to 93% in simulation and from 35% to 82% in the real world. PGDG also enables effective foundation models fine-tuning such as GR00T, increasing success from 46% to 77%. Additional results are available in our website: https://cunxid.github.io/PGDG/.