CLASP: General-Purpose Clothes Manipulation with Semantic Keypoints

📅 2025-07-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing garment manipulation approaches suffer from poor generalization, failing to unify diverse garment types (e.g., T-shirts, dresses) and heterogeneous tasks (e.g., folding, flattening, hanging). This work proposes a semantics-driven, keypoint-based framework for universal garment manipulation: garments are abstracted as sparse spatial-semantic keypoints, bridging high-level task planning and low-level action execution. The framework integrates RGB-D perception and vision-language models (VLMs) for cross-task semantic understanding, and invokes a pre-compiled skill library for embodied control. To our knowledge, this is the first approach establishing a unified manipulation paradigm across garment categories and tasks. We validate its strong generalization and robustness in simulation and on a Franka dual-arm robot, demonstrating significant performance gains over state-of-the-art methods and highlighting its potential for real-world deployment.

Technology Category

Application Category

📝 Abstract
Clothes manipulation, such as folding or hanging, is a critical capability for home service robots. Despite recent advances, most existing methods remain limited to specific tasks and clothes types, due to the complex, high-dimensional geometry of clothes. This paper presents CLothes mAnipulation with Semantic keyPoints (CLASP), which aims at general-purpose clothes manipulation over different clothes types, T-shirts, shorts, skirts, long dresses, ... , as well as different tasks, folding, flattening, hanging, ... . The core idea of CLASP is semantic keypoints -- e.g., ''left sleeve'', ''right shoulder'', etc. -- a sparse spatial-semantic representation that is salient for both perception and action. Semantic keypoints of clothes can be reliably extracted from RGB-D images and provide an effective intermediate representation of clothes manipulation policies. CLASP uses semantic keypoints to bridge high-level task planning and low-level action execution. At the high level, it exploits vision language models (VLMs) to predict task plans over the semantic keypoints. At the low level, it executes the plans with the help of a simple pre-built manipulation skill library. Extensive simulation experiments show that CLASP outperforms state-of-the-art baseline methods on multiple tasks across diverse clothes types, demonstrating strong performance and generalization. Further experiments with a Franka dual-arm system on four distinct tasks -- folding, flattening, hanging, and placing -- confirm CLASP's performance on a real robot.
Problem

Research questions and friction points this paper is trying to address.

General-purpose clothes manipulation for diverse types and tasks
Bridging high-level planning and low-level execution via semantic keypoints
Improving performance and generalization in robotic clothes handling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses semantic keypoints for clothes representation
Bridges planning and execution with keypoints
Leverages vision language models for task planning
🔎 Similar Papers
No similar papers found.
Yuhong Deng
Yuhong Deng
PhD student in Computer Science, National University of Singapore
RoboticsRobotic ManipulationRobot Learning
C
Chao Tang
School of Computing, Smart System Institute, National University of Singapore, Singapore and Department of Electronic and Electrical Engineering, Southern University of Science and Technology, China
Cunjun Yu
Cunjun Yu
National University of Singapore
RoboticsHuman-Robot InteractionAutonomous Driving
L
Linfeng Li
School of Computing, Smart System Institute, National University of Singapore, Singapore
David Hsu
David Hsu
Professor of Computer Science, National University of Singapore
RoboticsAIComputational Biology