HapticVLM: VLM-Driven Texture Recognition Aimed at Intelligent Haptic Interaction

📅 2025-05-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the cross-modal gap between visual perception and tactile experience by proposing the first real-time, vision-language-driven multimodal system with tactile feedback. Methodologically, it integrates a ConvNeXt-based material recognition network with the Qwen2-VL-2B-Instruct large vision-language model to enable semantic-guided material understanding and ambient temperature inference; tactile output is generated via speaker-based vibrotactile feedback coupled with a Peltier thermoregulation module, producing dynamic, discriminable audio-tactile signals. Its key contribution lies in pioneering the integration of large vision-language models into a closed-loop tactile interaction framework and establishing an end-to-end vision-to-tactile cross-modal mapping. Experimental results demonstrate 84.67% material classification accuracy, temperature estimation error within ±8°C (86.7% accuracy across 15 scenarios), and support for five distinguishable tactile patterns.

Technology Category

Application Category

📝 Abstract
This paper introduces HapticVLM, a novel multimodal system that integrates vision-language reasoning with deep convolutional networks to enable real-time haptic feedback. HapticVLM leverages a ConvNeXt-based material recognition module to generate robust visual embeddings for accurate identification of object materials, while a state-of-the-art Vision-Language Model (Qwen2-VL-2B-Instruct) infers ambient temperature from environmental cues. The system synthesizes tactile sensations by delivering vibrotactile feedback through speakers and thermal cues via a Peltier module, thereby bridging the gap between visual perception and tactile experience. Experimental evaluations demonstrate an average recognition accuracy of 84.67% across five distinct auditory-tactile patterns and a temperature estimation accuracy of 86.7% based on a tolerance-based evaluation method with an 8{deg}C margin of error across 15 scenarios. Although promising, the current study is limited by the use of a small set of prominent patterns and a modest participant pool. Future work will focus on expanding the range of tactile patterns and increasing user studies to further refine and validate the system's performance. Overall, HapticVLM presents a significant step toward context-aware, multimodal haptic interaction with potential applications in virtual reality, and assistive technologies.
Problem

Research questions and friction points this paper is trying to address.

Integrates vision-language reasoning for real-time haptic feedback
Recognizes object materials and infers ambient temperature accurately
Bridges visual perception and tactile experience via multimodal synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates vision-language reasoning with deep convolutional networks
Uses ConvNeXt for material recognition and Qwen2-VL for temperature inference
Delivers vibrotactile and thermal feedback via speakers and Peltier module
🔎 Similar Papers
No similar papers found.