TLA: Tactile-Language-Action Model for Contact-Rich Manipulation

📅 2025-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the lack of joint tactile-language-action modeling in haptically intensive, language-guided robotic manipulation tasks—such as fingertip peg-in-hole assembly—this paper proposes the first language-tactile-action trimodal alignment framework, leveraging language anchoring for semantic interpretation of tactile sequences and robust policy generation. Methodologically, we design a tactile sequence encoder and a language-guided cross-modal attention mechanism, integrated into an end-to-end action decoder. We also introduce the first large-scale, 24K-sample dataset pairing fingertip tactile signals with natural-language instructions. Evaluation shows >85% success rate on unseen peg geometries and insertion gaps—significantly outperforming diffusion-based policies and other baselines—while achieving substantial gains in both action accuracy and task effectiveness.

Technology Category

Application Category

📝 Abstract
Significant progress has been made in vision-language models. However, language-conditioned robotic manipulation for contact-rich tasks remains underexplored, particularly in terms of tactile sensing. To address this gap, we introduce the Tactile-Language-Action (TLA) model, which effectively processes sequential tactile feedback via cross-modal language grounding to enable robust policy generation in contact-intensive scenarios. In addition, we construct a comprehensive dataset that contains 24k pairs of tactile action instruction data, customized for fingertip peg-in-hole assembly, providing essential resources for TLA training and evaluation. Our results show that TLA significantly outperforms traditional imitation learning methods (e.g., diffusion policy) in terms of effective action generation and action accuracy, while demonstrating strong generalization capabilities by achieving over 85% success rate on previously unseen assembly clearances and peg shapes. We publicly release all data and code in the hope of advancing research in language-conditioned tactile manipulation skill learning. Project website: https://sites.google.com/view/tactile-language-action/
Problem

Research questions and friction points this paper is trying to address.

Develops a model for tactile-language-action integration in robotics.
Addresses lack of tactile sensing in language-conditioned robotic manipulation.
Improves action generation and accuracy in contact-rich tasks.
Innovation

Methods, ideas, or system contributions that make the work stand out.

TLA model integrates tactile feedback with language grounding.
Dataset includes 24k tactile action instruction pairs.
TLA outperforms traditional methods in action accuracy.
🔎 Similar Papers
No similar papers found.
P
Peng Hao
Samsung Research China - Beijing (SRC-B), Beijing 100028, China
Chaofan Zhang
Chaofan Zhang
Institute of Automation, Chinese Academy of Sciences
tactile perception and robots dexterous manipulation
D
Dingzhe Li
Samsung Research China - Beijing (SRC-B), Beijing 100028, China
Xiaoge Cao
Xiaoge Cao
Institute of Automation,Chinese Academy of Sciences
Xiaoshuai Hao
Xiaoshuai Hao
Beijing Academy of Artificial Intelligence,BAAI
vision and language
S
Shaowei Cui
The State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
S
Shuo Wang