🤖 AI Summary
This work addresses the cross-modal gap between visual perception and tactile experience by proposing the first real-time, vision-language-driven multimodal system with tactile feedback. Methodologically, it integrates a ConvNeXt-based material recognition network with the Qwen2-VL-2B-Instruct large vision-language model to enable semantic-guided material understanding and ambient temperature inference; tactile output is generated via speaker-based vibrotactile feedback coupled with a Peltier thermoregulation module, producing dynamic, discriminable audio-tactile signals. Its key contribution lies in pioneering the integration of large vision-language models into a closed-loop tactile interaction framework and establishing an end-to-end vision-to-tactile cross-modal mapping. Experimental results demonstrate 84.67% material classification accuracy, temperature estimation error within ±8°C (86.7% accuracy across 15 scenarios), and support for five distinguishable tactile patterns.
📝 Abstract
This paper introduces HapticVLM, a novel multimodal system that integrates vision-language reasoning with deep convolutional networks to enable real-time haptic feedback. HapticVLM leverages a ConvNeXt-based material recognition module to generate robust visual embeddings for accurate identification of object materials, while a state-of-the-art Vision-Language Model (Qwen2-VL-2B-Instruct) infers ambient temperature from environmental cues. The system synthesizes tactile sensations by delivering vibrotactile feedback through speakers and thermal cues via a Peltier module, thereby bridging the gap between visual perception and tactile experience. Experimental evaluations demonstrate an average recognition accuracy of 84.67% across five distinct auditory-tactile patterns and a temperature estimation accuracy of 86.7% based on a tolerance-based evaluation method with an 8{deg}C margin of error across 15 scenarios. Although promising, the current study is limited by the use of a small set of prominent patterns and a modest participant pool. Future work will focus on expanding the range of tactile patterns and increasing user studies to further refine and validate the system's performance. Overall, HapticVLM presents a significant step toward context-aware, multimodal haptic interaction with potential applications in virtual reality, and assistive technologies.