Visuo-Tactile World Models

📅 2026-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the physical inconsistencies—such as object disappearance or teleportation—that arise in purely visual world models when handling contact-rich tasks due to occlusion or ambiguous contact cues. To overcome this limitation, the paper presents the first systematic multimodal world model that integrates tactile perception with vision in a unified framework. Leveraging an autoregressive recurrent prediction architecture, the model jointly learns contact dynamics from both visual and tactile inputs, significantly enhancing physical fidelity during imagined rollouts and planning. Experimental results demonstrate a 33% improvement in object permanence and a 29% increase in adherence to motion regularities. Furthermore, in zero-shot real-robot tasks, the model achieves up to a 35% higher success rate and exhibits strong capabilities for rapid adaptation to novel tasks.

Technology Category

Application Category

📝 Abstract
We introduce multi-task Visuo-Tactile World Models (VT-WM), which capture the physics of contact through touch reasoning. By complementing vision with tactile sensing, VT-WM better understands robot-object interactions in contact-rich tasks, avoiding common failure modes of vision-only models under occlusion or ambiguous contact states, such as objects disappearing, teleporting, or moving in ways that violate basic physics. Trained across a set of contact-rich manipulation tasks, VT-WM improves physical fidelity in imagination, achieving 33% better performance at maintaining object permanence and 29% better compliance with the laws of motion in autoregressive rollouts. Moreover, experiments show that grounding in contact dynamics also translates to planning. In zero-shot real-robot experiments, VT-WM achieves up to 35% higher success rates, with the largest gains in multi-step, contact-rich tasks. Finally, VT-WM demonstrates significant downstream versatility, effectively adapting its learned contact dynamics to a novel task and achieving reliable planning success with only a limited set of demonstrations.
Problem

Research questions and friction points this paper is trying to address.

visuo-tactile
contact-rich manipulation
object permanence
physical fidelity
world models
Innovation

Methods, ideas, or system contributions that make the work stand out.

visuo-tactile
world models
contact dynamics
physical reasoning
robotic manipulation
🔎 Similar Papers
No similar papers found.