Language-Guided Multimodal Texture Authoring via Generative Models

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes the first language-driven multimodal tactile texture generation system, addressing the inefficiency and creative constraints of traditional methods that rely on parameter tuning and trial-and-error. The system enables synchronous synthesis of semantically consistent sliding vibration signals, transient tapping responses, and visual texture images from a single text prompt. By constructing a language-aligned shared latent space and integrating a force/velocity-conditioned autoregressive model, a transient tapping model, and a text-guided diffusion model, it achieves coherent cross-modal generation. User studies demonstrate that the approach accurately conveys material semantics—such as roughness, slipperiness, and hardness—and substantially reduces design iteration costs, thereby validating natural language as an effective and interpretable modality for tactile texture control.
📝 Abstract
Authoring realistic haptic textures typically requires low-level parameter tuning and repeated trial-and-error, limiting speed, transparency, and creative reach. We present a language-driven authoring system that turns natural-language prompts into multimodal textures: two coordinated haptic channels - sliding vibrations via force/speed-conditioned autoregressive (AR) models and tapping transients - and a text-prompted visual preview from a diffusion model. A shared, language-aligned latent links modalities so a single prompt yields semantically consistent haptic and visual signals; designers can write goals (e.g., "gritty but cushioned surface," "smooth and hard metal surface") and immediately see and feel the result through a 3D haptic device. To verify that the learned latent encodes perceptually meaningful structure, we conduct an anchor-referenced, attribute-wise evaluation for roughness, slipperiness, and hardness. Participant ratings are projected to the interpretable line between two real-material references, revealing consistent trends - asperity effects in roughness, compliance in hardness, and surface-film influence in slipperiness. A human-subject study further indicates coherent cross-modal experience and low effort for prompt-based iteration. The results show that language can serve as a practical control modality for texture authoring: prompts reliably steer material semantics across haptic and visual channels, enabling a prompt-first, designer-oriented workflow that replaces manual parameter tuning with interpretable, text-guided refinement.
Problem

Research questions and friction points this paper is trying to address.

haptic texture authoring
parameter tuning
trial-and-error
multimodal consistency
creative workflow
Innovation

Methods, ideas, or system contributions that make the work stand out.

language-guided texture authoring
multimodal haptics
generative models
cross-modal consistency
latent space alignment
🔎 Similar Papers
No similar papers found.
W
Wanli Qian
Department of Computer Science, University of Southern California, Los Angeles, USA
A
Aiden Chang
Department of Computer Science, University of Southern California, Los Angeles, USA
Shihan Lu
Shihan Lu
Postdoc Scholar, Northwestern University
HapticsRobot ManipulationTexture RenderingMultisensory Perception
M
Michael Gu
Department of Computer Science, University of Southern California, Los Angeles, USA
Heather Culbertson
Heather Culbertson
Assistant Professor, Computer Science, University of Southern California