🤖 AI Summary
This work proposes the first language-driven multimodal tactile texture generation system, addressing the inefficiency and creative constraints of traditional methods that rely on parameter tuning and trial-and-error. The system enables synchronous synthesis of semantically consistent sliding vibration signals, transient tapping responses, and visual texture images from a single text prompt. By constructing a language-aligned shared latent space and integrating a force/velocity-conditioned autoregressive model, a transient tapping model, and a text-guided diffusion model, it achieves coherent cross-modal generation. User studies demonstrate that the approach accurately conveys material semantics—such as roughness, slipperiness, and hardness—and substantially reduces design iteration costs, thereby validating natural language as an effective and interpretable modality for tactile texture control.
📝 Abstract
Authoring realistic haptic textures typically requires low-level parameter tuning and repeated trial-and-error, limiting speed, transparency, and creative reach. We present a language-driven authoring system that turns natural-language prompts into multimodal textures: two coordinated haptic channels - sliding vibrations via force/speed-conditioned autoregressive (AR) models and tapping transients - and a text-prompted visual preview from a diffusion model. A shared, language-aligned latent links modalities so a single prompt yields semantically consistent haptic and visual signals; designers can write goals (e.g., "gritty but cushioned surface," "smooth and hard metal surface") and immediately see and feel the result through a 3D haptic device. To verify that the learned latent encodes perceptually meaningful structure, we conduct an anchor-referenced, attribute-wise evaluation for roughness, slipperiness, and hardness. Participant ratings are projected to the interpretable line between two real-material references, revealing consistent trends - asperity effects in roughness, compliance in hardness, and surface-film influence in slipperiness. A human-subject study further indicates coherent cross-modal experience and low effort for prompt-based iteration. The results show that language can serve as a practical control modality for texture authoring: prompts reliably steer material semantics across haptic and visual channels, enabling a prompt-first, designer-oriented workflow that replaces manual parameter tuning with interpretable, text-guided refinement.