CLASP: General-Purpose Clothes Manipulation with Semantic Keypoints

📅 2025-07-26

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

Existing garment manipulation approaches suffer from poor generalization, failing to unify diverse garment types (e.g., T-shirts, dresses) and heterogeneous tasks (e.g., folding, flattening, hanging). This work proposes a semantics-driven, keypoint-based framework for universal garment manipulation: garments are abstracted as sparse spatial-semantic keypoints, bridging high-level task planning and low-level action execution. The framework integrates RGB-D perception and vision-language models (VLMs) for cross-task semantic understanding, and invokes a pre-compiled skill library for embodied control. To our knowledge, this is the first approach establishing a unified manipulation paradigm across garment categories and tasks. We validate its strong generalization and robustness in simulation and on a Franka dual-arm robot, demonstrating significant performance gains over state-of-the-art methods and highlighting its potential for real-world deployment.

Technology Category

Application Category

📝 Abstract

Clothes manipulation, such as folding or hanging, is a critical capability for home service robots. Despite recent advances, most existing methods remain limited to specific tasks and clothes types, due to the complex, high-dimensional geometry of clothes. This paper presents CLothes mAnipulation with Semantic keyPoints (CLASP), which aims at general-purpose clothes manipulation over different clothes types, T-shirts, shorts, skirts, long dresses, ... , as well as different tasks, folding, flattening, hanging, ... . The core idea of CLASP is semantic keypoints -- e.g., ''left sleeve'', ''right shoulder'', etc. -- a sparse spatial-semantic representation that is salient for both perception and action. Semantic keypoints of clothes can be reliably extracted from RGB-D images and provide an effective intermediate representation of clothes manipulation policies. CLASP uses semantic keypoints to bridge high-level task planning and low-level action execution. At the high level, it exploits vision language models (VLMs) to predict task plans over the semantic keypoints. At the low level, it executes the plans with the help of a simple pre-built manipulation skill library. Extensive simulation experiments show that CLASP outperforms state-of-the-art baseline methods on multiple tasks across diverse clothes types, demonstrating strong performance and generalization. Further experiments with a Franka dual-arm system on four distinct tasks -- folding, flattening, hanging, and placing -- confirm CLASP's performance on a real robot.

Problem

Research questions and friction points this paper is trying to address.

General-purpose clothes manipulation for diverse types and tasks

Bridging high-level planning and low-level execution via semantic keypoints

Improving performance and generalization in robotic clothes handling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses semantic keypoints for clothes representation

Bridges planning and execution with keypoints

Leverages vision language models for task planning

🔎 Similar Papers

General-purpose Clothes Manipulation with Semantic Keypoints