TLA: Tactile-Language-Action Model for Contact-Rich Manipulation

📅 2025-03-11

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

To address the lack of joint tactile-language-action modeling in haptically intensive, language-guided robotic manipulation tasks—such as fingertip peg-in-hole assembly—this paper proposes the first language-tactile-action trimodal alignment framework, leveraging language anchoring for semantic interpretation of tactile sequences and robust policy generation. Methodologically, we design a tactile sequence encoder and a language-guided cross-modal attention mechanism, integrated into an end-to-end action decoder. We also introduce the first large-scale, 24K-sample dataset pairing fingertip tactile signals with natural-language instructions. Evaluation shows >85% success rate on unseen peg geometries and insertion gaps—significantly outperforming diffusion-based policies and other baselines—while achieving substantial gains in both action accuracy and task effectiveness.

Technology Category

Application Category

📝 Abstract

Significant progress has been made in vision-language models. However, language-conditioned robotic manipulation for contact-rich tasks remains underexplored, particularly in terms of tactile sensing. To address this gap, we introduce the Tactile-Language-Action (TLA) model, which effectively processes sequential tactile feedback via cross-modal language grounding to enable robust policy generation in contact-intensive scenarios. In addition, we construct a comprehensive dataset that contains 24k pairs of tactile action instruction data, customized for fingertip peg-in-hole assembly, providing essential resources for TLA training and evaluation. Our results show that TLA significantly outperforms traditional imitation learning methods (e.g., diffusion policy) in terms of effective action generation and action accuracy, while demonstrating strong generalization capabilities by achieving over 85% success rate on previously unseen assembly clearances and peg shapes. We publicly release all data and code in the hope of advancing research in language-conditioned tactile manipulation skill learning. Project website: https://sites.google.com/view/tactile-language-action/

Problem

Research questions and friction points this paper is trying to address.

Develops a model for tactile-language-action integration in robotics.

Addresses lack of tactile sensing in language-conditioned robotic manipulation.

Improves action generation and accuracy in contact-rich tasks.

Innovation

Methods, ideas, or system contributions that make the work stand out.

TLA model integrates tactile feedback with language grounding.

Dataset includes 24k tactile action instruction pairs.

TLA outperforms traditional methods in action accuracy.

🔎 Similar Papers

MimicTouch: Leveraging Multi-modal Human Tactile Demonstrations for Contact-rich Manipulation

2023-10-25Citations: 1

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey

2024-04-28arXiv.orgCitations: 15

Toyota Research Institute

Los Altos, CA / Cambridge, MA

AI Research Scientist, Robotics