Planning-Guided Diffusion Policy Learning for Generalizable Contact-Rich Bimanual Manipulation

📅 2024-12-03

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

To address the challenges of scarce demonstration data and poor policy generalization in contact-intensive bimanual manipulation, this paper proposes GLIDE, a planning-guided diffusion policy learning framework. Methodologically, GLIDE integrates model-based, contact-aware motion planning (via CHOMP/STOMP variants) to generate large-scale high-fidelity simulated trajectories, and employs a task-conditioned diffusion model for end-to-end action sequence prediction. It further incorporates sim-to-real feature extraction, multimodal state encoding, physics-informed data augmentation, and bimanual coordination control. Its core innovation lies in the first incorporation of motion planning priors into diffusion policy training—enabling significantly improved cross-object and cross-scene generalization. Evaluated on a Franka Emika bimanual platform for tasks including box opening, object flipping, and assembly, GLIDE achieves over 85% success rates on unseen objects and poses.

Technology Category

Application Category

📝 Abstract

Contact-rich bimanual manipulation involves precise coordination of two arms to change object states through strategically selected contacts and motions. Due to the inherent complexity of these tasks, acquiring sufficient demonstration data and training policies that generalize to unseen scenarios remain a largely unresolved challenge. Building on recent advances in planning through contacts, we introduce Generalizable Planning-Guided Diffusion Policy Learning (GLIDE), an approach that effectively learns to solve contact-rich bimanual manipulation tasks by leveraging model-based motion planners to generate demonstration data in high-fidelity physics simulation. Through efficient planning in randomized environments, our approach generates large-scale and high-quality synthetic motion trajectories for tasks involving diverse objects and transformations. We then train a task-conditioned diffusion policy via behavior cloning using these demonstrations. To tackle the sim-to-real gap, we propose a set of essential design options in feature extraction, task representation, action prediction, and data augmentation that enable learning robust prediction of smooth action sequences and generalization to unseen scenarios. Through experiments in both simulation and the real world, we demonstrate that our approach can enable a bimanual robotic system to effectively manipulate objects of diverse geometries, dimensions, and physical properties. Website: https://glide-manip.github.io/

Problem

Research questions and friction points this paper is trying to address.

Generalizable bimanual manipulation learning

Planning-guided diffusion policy for contact-rich tasks

Sim-to-real gap in robotic manipulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Planning-Guided Diffusion Policy Learning

High-fidelity physics simulation data

Sim-to-real generalization techniques

🔎 Similar Papers

No similar papers found.