TextToucher: Fine-Grained Text-to-Touch Generation

📅 2024-09-09

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Existing tactile image generation methods heavily rely on visual inputs, leading to inaccurate modeling of fine-grained tactile features—such as surface texture, object geometry, and sensor gel deformation states. This work introduces the first text-to-tactile-image generation framework that operates without visual priors. Methodologically: (1) we propose a dual-granularity textual modeling and fusion mechanism—object-level (texture/shape) and sensor-level (gel deformation); (2) we design a contrastive text–tactile pretraining evaluation metric (CTTP); and (3) we implement dual-granularity conditional injection via a multimodal large language model, learnable text prompts, and a diffusion Transformer architecture. Extensive experiments across multiple tactile datasets demonstrate significant improvements over vision-driven baselines, with generated tactile images exhibiting higher fidelity to human tactile perception. The code is publicly available.

Technology Category

Application Category

📝 Abstract

Tactile sensation plays a crucial role in the development of multi-modal large models and embodied intelligence. To collect tactile data with minimal cost as possible, a series of studies have attempted to generate tactile images by vision-to-touch image translation. However, compared to text modality, visual modality-driven tactile generation cannot accurately depict human tactile sensation. In this work, we analyze the characteristics of tactile images in detail from two granularities: object-level (tactile texture, tactile shape), and sensor-level (gel status). We model these granularities of information through text descriptions and propose a fine-grained Text-to-Touch generation method (TextToucher) to generate high-quality tactile samples. Specifically, we introduce a multimodal large language model to build the text sentences about object-level tactile information and employ a set of learnable text prompts to represent the sensor-level tactile information. To better guide the tactile generation process with the built text information, we fuse the dual grains of text information and explore various dual-grain text conditioning methods within the diffusion transformer architecture. Furthermore, we propose a Contrastive Text-Touch Pre-training (CTTP) metric to precisely evaluate the quality of text-driven generated tactile data. Extensive experiments demonstrate the superiority of our TextToucher method. The source codes will be available at url{https://github.com/TtuHamg/TextToucher}.

Problem

Research questions and friction points this paper is trying to address.

Visual-to-Tactile Sensing

Texture Recognition

Shape Perception

Innovation

Methods, ideas, or system contributions that make the work stand out.

TextToucher

CTTP metric

Tactile data generation

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

AI Research Scientist, Robotics