๐ค AI Summary
This work addresses the limited generalization of existing deformable object tracking methods, which often rely on object-specific modeling or suffer from simulation-to-reality transfer challenges. The authors propose a visionโtactile imitation learning framework that, for the first time, enables unified tracking of both one-dimensional and two-dimensional deformable objects within a single architecture. The approach introduces a local tactile centroid-weighted loss to emphasize critical contact regions and incorporates a global task-progress modulation mechanism to refine tracking dynamics. Additionally, a low-cost visuo-tactile teleoperation system is developed to collect training data. Experimental results demonstrate an average success rate of 80% on seen objects and 65% on unseen objects, significantly improving cross-object generalization compared to prior methods.
๐ Abstract
Deformable objects often appear in unstructured configurations. Tracing deformable objects helps bringing them into extended states and facilitating the downstream manipulation tasks. Due to the requirements for object-specific modeling or sim-to-real transfer, existing tracing methods either lack generalizability across different categories of deformable objects or struggle to complete tasks reliably in the real world. To address this, we propose a novel visual-tactile imitation learning method to achieve one-dimensional (1D) and two-dimensional (2D) deformable object tracing with a unified model. Our method is designed from both local and global perspectives based on visual and tactile sensing. Locally, we introduce a weighted loss that emphasizes actions maintaining contact near the center of the tactile image, improving fine-grained adjustment. Globally, we propose a tracing task loss that helps the policy to regulate task progression. On the hardware side, to compensate for the limited features extracted from visual information, we integrate tactile sensing into a low-cost teleoperation system considering both the teleoperator and the robot. Extensive ablation and comparative experiments on diverse 1D and 2D deformable objects demonstrate the effectiveness of our approach, achieving an average success rate of 80% on seen objects and 65% on unseen objects.