VLA-Touch: Enhancing Vision-Language-Action Models with Dual-Level Tactile Feedback

📅 2025-07-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current vision-language-action (VLA) models struggle to incorporate tactile signals, limiting their planning accuracy and manipulation fidelity in contact-intensive tasks—largely due to the scarcity of high-quality multimodal tactile datasets. To address this, we propose a two-tier tactile enhancement framework: (1) an upper-level semantic tactile feedback module that leverages a pretrained tactile-language model to map raw tactile sensor data into linguistically interpretable semantic descriptions; and (2) a lower-level diffusion-based controller that fuses visual, linguistic, and tactile-semantic inputs to generate robust action policies. Crucially, our approach integrates tactile information non-intrusively—without fine-tuning the underlying VLA model. Evaluated on a real robotic platform, our method achieves significant improvements in execution accuracy (+23.6%) and planning success rate (+18.4%) for contact-rich tasks. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
Tactile feedback is generally recognized to be crucial for effective interaction with the physical world. However, state-of-the-art Vision-Language-Action (VLA) models lack the ability to interpret and use tactile signals, limiting their effectiveness in contact-rich tasks. Incorporating tactile feedback into these systems is challenging due to the absence of large multi-modal datasets. We present VLA-Touch, an approach that enhances generalist robot policies with tactile sensing emph{without fine-tuning} the base VLA. Our method introduces two key innovations: (1) a pipeline that leverages a pretrained tactile-language model that provides semantic tactile feedback for high-level task planning, and (2) a diffusion-based controller that refines VLA-generated actions with tactile signals for contact-rich manipulation. Through real-world experiments, we demonstrate that our dual-level integration of tactile feedback improves task planning efficiency while enhancing execution precision. Code is open-sourced at href{https://github.com/jxbi1010/VLA-Touch}{this URL}.
Problem

Research questions and friction points this paper is trying to address.

VLA models lack tactile feedback for contact-rich tasks
Absence of multi-modal datasets hinders tactile integration
Dual-level tactile feedback enhances task planning and execution
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pretrained tactile-language model for semantic feedback
Diffusion-based controller refining VLA actions
Dual-level tactile integration without fine-tuning
🔎 Similar Papers
No similar papers found.