π€ AI Summary
Existing robotic manipulation datasets struggle to simultaneously achieve scalability, diversity, and high qualityβdue to challenges in sim-to-real transfer, high labor costs for manual data collection, and inherent limitations in behavioral diversity. To address this, we propose FieldGen, a field-guided data generation framework that decouples manipulation into pre-manipulation and fine-manipulation stages. FieldGen synergistically integrates a small set of high-fidelity human demonstrations with attraction-field-based automated trajectory generation, enabling scalable, diverse, and high-quality dataset construction. Furthermore, it incorporates reward labeling to enhance policy learning. Experiments demonstrate that policies trained with FieldGen achieve significantly higher success rates and stability on real-world tasks compared to teleoperation baselines, while reducing long-term human data-collection effort by approximately 70%.
π Abstract
Large-scale and diverse datasets are vital for training robust robotic manipulation policies, yet existing data collection methods struggle to balance scale, diversity, and quality. Simulation offers scalability but suffers from sim-to-real gaps, while teleoperation yields high-quality demonstrations with limited diversity and high labor cost. We introduce FieldGen, a field-guided data generation framework that enables scalable, diverse, and high-quality real-world data collection with minimal human supervision. FieldGen decomposes manipulation into two stages: a pre-manipulation phase, allowing trajectory diversity, and a fine manipulation phase requiring expert precision. Human demonstrations capture key contact and pose information, after which an attraction field automatically generates diverse trajectories converging to successful configurations. This decoupled design combines scalable trajectory diversity with precise supervision. Moreover, FieldGen-Reward augments generated data with reward annotations to further enhance policy learning. Experiments demonstrate that policies trained with FieldGen achieve higher success rates and improved stability compared to teleoperation-based baselines, while significantly reducing human effort in long-term real-world data collection. Webpage is available at https://fieldgen.github.io/.