🤖 AI Summary
Data collection for multi-step manipulation tasks on dual-arm mobile robots is costly and challenging due to trade-offs between base reachability and camera visibility. Method: We propose a constraint-optimization-based method for automatic synthetic data generation in simulation, formulating trajectory synthesis as a unified optimization problem integrating hard constraints (e.g., manipulator kinematic feasibility and base stability) and soft constraints (e.g., visual observability during navigation), guided by minimal human demonstrations. Contribution/Results: Evaluated on four complex multi-step tasks, our approach enables learning feasible policies from a single demonstration and achieves successful real-robot deployment after fine-tuning with only 40 real-world trajectories—significantly outperforming purely simulation-trained baselines. This work introduces the first decoupled constraint-optimization framework for multi-step mobile manipulation data generation, establishing a new paradigm for low-cost, imitation-learning-driven embodied intelligence.
📝 Abstract
Imitation learning from large-scale, diverse human demonstrations has proven effective for training robots, but collecting such data is costly and time-consuming. This challenge is amplified for multi-step bimanual mobile manipulation, where humans must teleoperate both a mobile base and two high-degree-of-freedom arms. Prior automated data generation frameworks have addressed static bimanual manipulation by augmenting a few human demonstrations in simulation, but they fall short for mobile settings due to two key challenges: (1) determining base placement to ensure reachability, and (2) positioning the camera to provide sufficient visibility for visuomotor policies. To address these issues, we introduce MoMaGen, which formulates data generation as a constrained optimization problem that enforces hard constraints (e.g., reachability) while balancing soft constraints (e.g., visibility during navigation). This formulation generalizes prior approaches and provides a principled foundation for future methods. We evaluate MoMaGen on four multi-step bimanual mobile manipulation tasks and show that it generates significantly more diverse datasets than existing methods. Leveraging this diversity, MoMaGen can train successful imitation learning policies from a single source demonstration, and these policies can be fine-tuned with as few as 40 real-world demonstrations to achieve deployment on physical robotic hardware. More details are available at our project page: momagen.github.io.