🤖 AI Summary
Current vision-language-action (VLA) models struggle to incorporate tactile signals, limiting their planning accuracy and manipulation fidelity in contact-intensive tasks—largely due to the scarcity of high-quality multimodal tactile datasets. To address this, we propose a two-tier tactile enhancement framework: (1) an upper-level semantic tactile feedback module that leverages a pretrained tactile-language model to map raw tactile sensor data into linguistically interpretable semantic descriptions; and (2) a lower-level diffusion-based controller that fuses visual, linguistic, and tactile-semantic inputs to generate robust action policies. Crucially, our approach integrates tactile information non-intrusively—without fine-tuning the underlying VLA model. Evaluated on a real robotic platform, our method achieves significant improvements in execution accuracy (+23.6%) and planning success rate (+18.4%) for contact-rich tasks. The implementation is publicly available.
📝 Abstract
Tactile feedback is generally recognized to be crucial for effective interaction with the physical world. However, state-of-the-art Vision-Language-Action (VLA) models lack the ability to interpret and use tactile signals, limiting their effectiveness in contact-rich tasks. Incorporating tactile feedback into these systems is challenging due to the absence of large multi-modal datasets. We present VLA-Touch, an approach that enhances generalist robot policies with tactile sensing emph{without fine-tuning} the base VLA. Our method introduces two key innovations: (1) a pipeline that leverages a pretrained tactile-language model that provides semantic tactile feedback for high-level task planning, and (2) a diffusion-based controller that refines VLA-generated actions with tactile signals for contact-rich manipulation. Through real-world experiments, we demonstrate that our dual-level integration of tactile feedback improves task planning efficiency while enhancing execution precision. Code is open-sourced at href{https://github.com/jxbi1010/VLA-Touch}{this URL}.