DeformPAM: Data-Efficient Learning for Long-horizon Deformable Object Manipulation via Preference-based Action Alignment

📅 2024-10-15

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

To address data inefficiency, distributional shift, and error accumulation in imitation learning for long-horizon deformable object manipulation, this paper proposes a novel framework integrating implicit reward modeling from human preferences with diffusion-based action primitive alignment. Methodologically, it unifies 3D point-cloud perception, conditional diffusion models for multimodal action distribution representation, preference-based implicit reward learning, and a multi-candidate action rescoring and optimal selection mechanism. The framework achieves stable, high-quality long-horizon policy generation from only a few demonstrations. Evaluated on three real-world deformable object manipulation tasks, it significantly improves task success rates and manipulation quality over existing imitation learning baselines. Notably, it is the first to achieve high-robustness, long-horizon deformable manipulation under few-shot settings.

Technology Category

Application Category

📝 Abstract

In recent years, imitation learning has made progress in the field of robotic manipulation. However, it still faces challenges when addressing complex long-horizon tasks with deformable objects, such as high-dimensional state spaces, complex dynamics, and multimodal action distributions. Traditional imitation learning methods often require a large amount of data and encounter distributional shifts and accumulative errors in these tasks. To address these issues, we propose a data-efficient general learning framework (DeformPAM) based on preference learning and reward-guided action selection. DeformPAM decomposes long-horizon tasks into multiple action primitives, utilizes 3D point cloud inputs and diffusion models to model action distributions, and trains an implicit reward model using human preference data. During the inference phase, the reward model scores multiple candidate actions, selecting the optimal action for execution, thereby reducing the occurrence of anomalous actions and improving task completion quality. Experiments conducted on three challenging real-world long-horizon deformable object manipulation tasks demonstrate the effectiveness of this method. Results show that DeformPAM improves both task completion quality and efficiency compared to baseline methods even with limited data. Code and data will be available at https://deform-pam.robotflow.ai.

Problem

Research questions and friction points this paper is trying to address.

Addresses challenges in long-horizon deformable object manipulation tasks.

Reduces data requirements and errors in imitation learning methods.

Improves task completion quality and efficiency with limited data.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Preference-based action alignment for efficiency

3D point cloud and diffusion model integration

Implicit reward model for optimal action selection

🔎 Similar Papers

Omnigrasp: Grasping Diverse Objects with Simulated Humanoids