🤖 AI Summary
Dexterous in-hand object rotation with multi-fingered hands remains challenging under tactile sensing limitations—particularly in palm-up/palm-down configurations—due to insufficient haptic feedback and difficulty in designing effective reward functions.
Method: We propose the first framework leveraging large language models (LLMs) to automatically synthesize interpretable, compact reward functions for embodied tactile dexterous manipulation. Our approach integrates vision-based real tactile sensing with multimodal observations and employs scalable prompt engineering to generate human-readable rewards, followed by sim-to-real policy distillation for efficient transfer.
Contribution/Results: This work pioneers LLM-driven reward modeling for tactile dexterous manipulation; reduces reward design time significantly; and demonstrates superior rotational speed and stability on a four-finger fully actuated dexterous hand compared to manually tuned baselines. The learned rewards are shorter, more generalizable, and enable more automated, scalable tactile manipulation—advancing toward universal, data-efficient dexterity.
📝 Abstract
Large language models (LLMs) are beginning to automate reward design for dexterous manipulation. However, no prior work has considered tactile sensing, which is known to be critical for human-like dexterity. We present Text2Touch, bringing LLM-crafted rewards to the challenging task of multi-axis in-hand object rotation with real-world vision based tactile sensing in palm-up and palm-down configurations. Our prompt engineering strategy scales to over 70 environment variables, and sim-to-real distillation enables successful policy transfer to a tactile-enabled fully actuated four-fingered dexterous robot hand. Text2Touch significantly outperforms a carefully tuned human-engineered baseline, demonstrating superior rotation speed and stability while relying on reward functions that are an order of magnitude shorter and simpler. These results illustrate how LLM-designed rewards can significantly reduce the time from concept to deployable dexterous tactile skills, supporting more rapid and scalable multimodal robot learning. Project website: https://hpfield.github.io/text2touch-website