The Language of Touch: Translating Vibrations into Text with Dual-Branch Learning

📅 2026-03-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of semantic descriptions for tactile vibration signals by introducing, for the first time, the task of tactile captioning and presenting LMT108-CAP, the first paired tactile-text dataset. To tackle this challenge, the authors propose ViPAC, a dual-branch neural network that disentangles periodic and aperiodic components in tactile signals. The method incorporates orthogonality constraints to ensure feature complementarity and employs a dynamic fusion mechanism to adaptively integrate multi-scale information. Experimental results demonstrate that ViPAC significantly outperforms baseline approaches adapted from audio and image captioning on LMT108-CAP, achieving superior performance in both lexical fidelity and semantic alignment.
📝 Abstract
The standardization of vibrotactile data by IEEE P1918.1 workgroup has greatly advanced its applications in virtual reality, human-computer interaction and embodied artificial intelligence. Despite these efforts, the semantic interpretation and understanding of vibrotactile signals remain an unresolved challenge. In this paper, we make the first attempt to address vibrotactile captioning, {\it i.e.}, generating natural language descriptions from vibrotactile signals. We propose Vibrotactile Periodic-Aperiodic Captioning (ViPAC), a method designed to handle the intrinsic properties of vibrotactile data, including hybrid periodic-aperiodic structures and the lack of spatial semantics. Specifically, ViPAC employs a dual-branch strategy to disentangle periodic and aperiodic components, combined with a dynamic fusion mechanism that adaptively integrates signal features. It also introduces an orthogonality constraint and weighting regularization to ensure feature complementarity and fusion consistency. Additionally, we construct LMT108-CAP, the first vibrotactile-text paired dataset, using GPT-4o to generate five constrained captions per surface image from the popular LMT-108 dataset. Experiments show that ViPAC significantly outperforms the baseline methods adapted from audio and image captioning, achieving superior lexical fidelity and semantic alignment.
Problem

Research questions and friction points this paper is trying to address.

vibrotactile captioning
semantic interpretation
tactile signals
natural language generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

vibrotactile captioning
dual-branch learning
periodic-aperiodic disentanglement
dynamic fusion
tactile-text dataset
🔎 Similar Papers
No similar papers found.
J
Jin Chen
Fujian Key Lab for Intelligent Processing and Wireless Transmission of Media Information, Fuzhou University, Fuzhou 350108, China
Y
Yifeng Lin
Fujian Key Lab for Intelligent Processing and Wireless Transmission of Media Information, Fuzhou University, Fuzhou 350108, China
C
Chao Zeng
School of Artificial Intelligence, Hubei University, Wuhan, China; Key Laboratory of Intelligent Sensing System and Security, Hubei University and Ministry of Education, China
Si Wu
Si Wu
Professor of Computer Science, South China University of Technology
machine learningcomputer vision
Tiesong Zhao
Tiesong Zhao
Dept. Communication Engineering, Fuzhou University
Multimedia CommunicationVideo CodingImage Quality AssessmentHaptics