MoMaGen: Generating Demonstrations under Soft and Hard Constraints for Multi-Step Bimanual Mobile Manipulation

📅 2025-10-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Data collection for multi-step manipulation tasks on dual-arm mobile robots is costly and challenging due to trade-offs between base reachability and camera visibility. Method: We propose a constraint-optimization-based method for automatic synthetic data generation in simulation, formulating trajectory synthesis as a unified optimization problem integrating hard constraints (e.g., manipulator kinematic feasibility and base stability) and soft constraints (e.g., visual observability during navigation), guided by minimal human demonstrations. Contribution/Results: Evaluated on four complex multi-step tasks, our approach enables learning feasible policies from a single demonstration and achieves successful real-robot deployment after fine-tuning with only 40 real-world trajectories—significantly outperforming purely simulation-trained baselines. This work introduces the first decoupled constraint-optimization framework for multi-step mobile manipulation data generation, establishing a new paradigm for low-cost, imitation-learning-driven embodied intelligence.

Technology Category

Application Category

📝 Abstract
Imitation learning from large-scale, diverse human demonstrations has proven effective for training robots, but collecting such data is costly and time-consuming. This challenge is amplified for multi-step bimanual mobile manipulation, where humans must teleoperate both a mobile base and two high-degree-of-freedom arms. Prior automated data generation frameworks have addressed static bimanual manipulation by augmenting a few human demonstrations in simulation, but they fall short for mobile settings due to two key challenges: (1) determining base placement to ensure reachability, and (2) positioning the camera to provide sufficient visibility for visuomotor policies. To address these issues, we introduce MoMaGen, which formulates data generation as a constrained optimization problem that enforces hard constraints (e.g., reachability) while balancing soft constraints (e.g., visibility during navigation). This formulation generalizes prior approaches and provides a principled foundation for future methods. We evaluate MoMaGen on four multi-step bimanual mobile manipulation tasks and show that it generates significantly more diverse datasets than existing methods. Leveraging this diversity, MoMaGen can train successful imitation learning policies from a single source demonstration, and these policies can be fine-tuned with as few as 40 real-world demonstrations to achieve deployment on physical robotic hardware. More details are available at our project page: momagen.github.io.
Problem

Research questions and friction points this paper is trying to address.

Generating diverse demonstrations for bimanual mobile manipulation tasks
Solving base placement and camera positioning constraints in data generation
Enabling imitation learning from minimal real-world demonstrations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Formulates data generation as constrained optimization problem
Enforces hard constraints like reachability and soft constraints
Generates diverse datasets from single source demonstration
🔎 Similar Papers
No similar papers found.