DefFusionNet: Learning Multimodal Goal Shapes for Deformable Object Manipulation via a Diffusion-based Probabilistic Model

📅 2025-06-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In deformable object manipulation, acquiring target shapes typically relies on manual specification and struggles with multimodal tasks—where multiple valid goal configurations exist. Method: This paper introduces the first diffusion probabilistic model–based framework for generating target shapes. Unlike conventional shape servoing—which requires a predefined single goal—or DefGoalNet—which outputs an averaged shape—our approach learns the shape distribution from few human demonstrations and employs a diffusion process to sample diverse, high-fidelity multimodal goal shapes. We innovatively integrate diffusion modeling into deformable object manipulation, designing an end-to-end generative network that jointly leverages visual feedback and probabilistic modeling for controllable shape generation. Results: Evaluated in simulation and on real robotic platforms (surgical suturing and sewing), our method significantly improves diversity, plausibility, and task success rate of generated goal shapes, establishing a scalable generative foundation for dexterous manipulation.

Technology Category

Application Category

📝 Abstract
Deformable object manipulation is critical to many real-world robotic applications, ranging from surgical robotics and soft material handling in manufacturing to household tasks like laundry folding. At the core of this important robotic field is shape servoing, a task focused on controlling deformable objects into desired shapes. The shape servoing formulation requires the specification of a goal shape. However, most prior works in shape servoing rely on impractical goal shape acquisition methods, such as laborious domain-knowledge engineering or manual manipulation. DefGoalNet previously posed the current state-of-the-art solution to this problem, which learns deformable object goal shapes directly from a small number of human demonstrations. However, it significantly struggles in multi-modal settings, where multiple distinct goal shapes can all lead to successful task completion. As a deterministic model, DefGoalNet collapses these possibilities into a single averaged solution, often resulting in an unusable goal. In this paper, we address this problem by developing DefFusionNet, a novel neural network that leverages the diffusion probabilistic model to learn a distribution over all valid goal shapes rather than predicting a single deterministic outcome. This enables the generation of diverse goal shapes and avoids the averaging artifacts. We demonstrate our method's effectiveness on robotic tasks inspired by both manufacturing and surgical applications, both in simulation and on a physical robot. Our work is the first generative model capable of producing a diverse, multi-modal set of deformable object goals for real-world robotic applications.
Problem

Research questions and friction points this paper is trying to address.

Learning multimodal goal shapes for deformable object manipulation
Addressing limitations of deterministic models in shape servoing
Generating diverse goal shapes using diffusion probabilistic models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses diffusion probabilistic model for shape learning
Generates diverse multi-modal goal shapes
Avoids deterministic averaging artifacts