MetaFold: Language-Guided Multi-Category Garment Folding Framework via Trajectory Generation and Foundation Model

📅 2025-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses key challenges in robotic garment folding—namely, state-space explosion due to fabric’s high degrees of freedom, difficulty in dynamic modeling, and poor cross-category generalization. We propose a language-driven disentangled folding framework. Methodologically, we introduce a novel synergistic architecture that jointly leverages high-level language-instructed point-cloud trajectory generation and a low-level embodied manipulation foundation model. The framework integrates multimodal language embeddings, temporal point-cloud modeling, and diffusion-based trajectory generation to explicitly decouple task planning from action execution. It enables zero-shot category generalization and fine-grained semantic instruction understanding. Experiments on six real-world garment categories achieve an 89.2% successful folding rate—significantly outperforming prior methods—while demonstrating strong generalization and robustness to diverse user instructions.

Technology Category

Application Category

📝 Abstract
Garment folding is a common yet challenging task in robotic manipulation. The deformability of garments leads to a vast state space and complex dynamics, which complicates precise and fine-grained manipulation. Previous approaches often rely on predefined key points or demonstrations, limiting their generalization across diverse garment categories. This paper presents a framework, MetaFold, that disentangles task planning from action prediction, learning each independently to enhance model generalization. It employs language-guided point cloud trajectory generation for task planning and a low-level foundation model for action prediction. This structure facilitates multi-category learning, enabling the model to adapt flexibly to various user instructions and folding tasks. Experimental results demonstrate the superiority of our proposed framework. Supplementary materials are available on our website: https://meta-fold.github.io/.
Problem

Research questions and friction points this paper is trying to address.

Addresses garment folding challenges in robotic manipulation.
Overcomes limitations of predefined key points and demonstrations.
Enhances generalization across diverse garment categories.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Language-guided point cloud trajectory generation
Disentangles task planning from action prediction
Low-level foundation model for action prediction
🔎 Similar Papers
H
Haonan Chen
School of Computing, National University of Singapore; NUS Guangzhou Research Translation and Innovation Institute
J
Junxiao Li
School of Computer Science, Nanjing University
Ruihai Wu
Ruihai Wu
Peking University
computer visionrobotics
Yiwei Liu
Yiwei Liu
Defence Industry Secrecy Examination and Certification Center
Information TheoremSocial networkPrivacy Protection
Yiwen Hou
Yiwen Hou
National University of Singapore
Reinforcement Learning
Zhixuan Xu
Zhixuan Xu
National University of Singapore
Robotics
Jingxiang Guo
Jingxiang Guo
National University of Singapore
Manipulation
C
Chongkai Gao
School of Computing, National University of Singapore
Zhenyu Wei
Zhenyu Wei
Westlake university
RobotSignal ProcessingReinforcement LearningCircuit DesignCompressed Sensing
S
Shensi Xu
School of Computer Science, Nanjing University
Jiaqi Huang
Jiaqi Huang
University of Central Missouri
CybersecurityIoV
L
Lin Shao
School of Computing, National University of Singapore; NUS Guangzhou Research Translation and Innovation Institute