Disentangling perception and reasoning for improving data efficiency in learning cloth manipulation without demonstrations

📅 2026-01-29

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This work addresses the challenges of low data efficiency and high computational cost in cloth manipulation, which arise from the high-dimensional state space, complex dynamics, and self-occlusions inherent to deformable objects. To overcome these issues, the authors propose a modular reinforcement learning approach that decouples perception and reasoning modules, abandoning the conventional end-to-end image-input paradigm in favor of a structured state representation. This design enables efficient training of lightweight models in simulation, substantially improving both sample efficiency and transferability. The resulting policy is successfully deployed on real-world cloth manipulation tasks. Evaluated on the SoftGym benchmark, the method achieves superior performance over existing baselines despite using a significantly smaller model, demonstrating efficient policy learning and robust cross-domain transfer.

Technology Category

Application Category

📝 Abstract

Cloth manipulation is a ubiquitous task in everyday life, but it remains an open challenge for robotics. The difficulties in developing cloth manipulation policies are attributed to the high-dimensional state space, complex dynamics, and high propensity to self-occlusion exhibited by fabrics. As analytical methods have not been able to provide robust and general manipulation policies, reinforcement learning (RL) is considered a promising approach to these problems. However, to address the large state space and complex dynamics, data-based methods usually rely on large models and long training times. The resulting computational cost significantly hampers the development and adoption of these methods. Additionally, due to the challenge of robust state estimation, garment manipulation policies often adopt an end-to-end learning approach with workspace images as input. While this approach enables a conceptually straightforward sim-to-real transfer via real-world fine-tuning, it also incurs a significant computational cost by training agents on a highly lossy representation of the environment state. This paper questions this common design choice by exploring an efficient and modular approach to RL for cloth manipulation. We show that, through careful design choices, model size and training time can be significantly reduced when learning in simulation. Furthermore, we demonstrate how the resulting simulation-trained model can be transferred to the real world. We evaluate our approach on the SoftGym benchmark and achieve significant performance improvements over available baselines on our task, while using a substantially smaller model.

Problem

Research questions and friction points this paper is trying to address.

cloth manipulation

data efficiency

reinforcement learning

state representation

sim-to-real transfer

Innovation

Methods, ideas, or system contributions that make the work stand out.

disentangled perception and reasoning

data-efficient reinforcement learning

cloth manipulation