Touch-R1: Reinforcing Touch Reasoning in MLLMs

📅 2026-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge that multimodal large language models struggle to correct vision-based priors using physical tactile evidence during reasoning. To this end, the authors introduce TouchReason-Bench, a large-scale tactile–language dataset and evaluation benchmark, and propose the first reinforcement learning–based tactile reasoning framework built upon Qwen2.5-VL-7B. The approach incorporates a tactile-grounded GRPO training objective and a tactile-utilization reward mechanism to ensure effective integration of real tactile inputs. Experimental results demonstrate that the resulting model, Touch-R1-7B, substantially outperforms Octopi-13B by 18.4% and GPT-4o by 24.7% on average across TouchReason-Bench, marking a significant advance in tactile reasoning capabilities.
📝 Abstract
While rule-based reinforcement learning has recently catalyzed explicit reasoning in multimodal models, tactile reasoning remains largely underexplored. Existing tactile-language models primarily rely on supervised or contrastive objectives, which limits their capacity to ground predictions in physical evidence or rectify misleading visual priors. Tactile reasoning introduces two modality-specific challenges: the ordinal nature of physical attributes (e.g., hardness, roughness) and the cross-sensor distribution shifts inherent in optical tactile hardware. In this work, we introduce TouchReason-1M, a large-scale multimodal dataset comprising over 1M synchronized tactile pairs across four distinct sensors, and TouchReason-Bench, a rigorous framework for evaluating tactile perception and visual-tactile conflict resolution. Building upon these, we propose Touch-R1, a tactile reasoning MLLM based on Qwen2.5-VL-7B. Touch-R1 is trained via a tactile-grounded GRPO objective that combines ordinal-aware accuracy, cross-sensor physical consistency, structured-format control, and an input-side tactile grounding objective. Specifically, the tactile-use reward assigns credit only when authentic tactile inputs yield superior correctness relative to counterfactual controls where the tactile stream is removed, shuffled, or noise-masked. On TouchReason-Bench, Touch-R1-7B outperforms Octopi-13B by 18.4\% and GPT-4o by 24.7\% on average. Its structured reasoning traces reveal emergent behaviors of probing, comparison, and revision, demonstrating that R1-style reasoning can be effectively grounded in physical contact.
Problem

Research questions and friction points this paper is trying to address.

tactile reasoning
multimodal language models
physical grounding
ordinal attributes
cross-sensor distribution shift
Innovation

Methods, ideas, or system contributions that make the work stand out.

tactile reasoning
multimodal LLM
ordinal-aware learning
cross-sensor consistency
tactile grounding
🔎 Similar Papers
No similar papers found.