🤖 AI Summary
Manipulating deformable and fragile objects—such as tofu—requires precise control of contact forces to avoid irreversible damage, yet conventional approaches rely on accurate physical modeling or dedicated stress sensors, limiting practicality.
Method: This paper proposes a vision-based reinforcement learning framework that eliminates the need for explicit physics models or tactile sensors. It introduces a stress-guided reward function, integrates curriculum learning (progressing from rigid to deformable objects), and initializes policies via offline demonstrations; end-to-end training is conducted in simulation, enabling zero-shot sim-to-real transfer.
Contribution/Results: The key innovation lies in implicitly estimating contact stress from visual inputs and explicitly penalizing excessive force via a stress-aware reward mechanism. Curriculum learning enhances policy generalization and training stability. In real-world tofu grasping and pushing tasks, the method reduces applied stress by 36.5% compared to baseline RL methods, effectively preventing structural damage and demonstrating superior efficacy and robustness.
📝 Abstract
Robotic manipulation of deformable and fragile objects presents significant challenges, as excessive stress can lead to irreversible damage to the object. While existing solutions rely on accurate object models or specialized sensors and grippers, this adds complexity and often lacks generalization. To address this problem, we present a vision-based reinforcement learning approach that incorporates a stress-penalized reward to discourage damage to the object explicitly. In addition, to bootstrap learning, we incorporate offline demonstrations as well as a designed curriculum progressing from rigid proxies to deformables. We evaluate the proposed method in both simulated and real-world scenarios, showing that the policy learned in simulation can be transferred to the real world in a zero-shot manner, performing tasks such as picking up and pushing tofu. Our results show that the learned policies exhibit a damage-aware, gentle manipulation behavior, demonstrating their effectiveness by decreasing the stress applied to fragile objects by 36.5% while achieving the task goals, compared to vanilla RL policies.