Learning When to See and When to Feel: Adaptive Vision-Torque Fusion for Contact-Aware Manipulation

📅 2026-04-01

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

This work addresses the challenge that purely vision-based policies struggle to effectively exploit the rich contact dynamics and interaction quality information encoded in force/torque signals during contact-intensive, force-sensitive robotic manipulation tasks. To this end, the paper introduces a contact-aware adaptive multimodal fusion mechanism within a diffusion policy framework. The approach dynamically activates force/torque inputs based on detected contact states—relying solely on visual observations during non-contact phases and adaptively fusing visual and haptic modalities upon contact. This strategy significantly enhances policy efficiency and robustness, achieving a 14% improvement in task success rate over the strongest baseline across multiple manipulation benchmarks, thereby validating the efficacy of demand-driven integration of force feedback.

Technology Category

Application Category

📝 Abstract

Vision-based policies have achieved a good performance in robotic manipulation due to the accessibility and richness of visual observations. However, purely visual sensing becomes insufficient in contact-rich and force-sensitive tasks where force/torque (F/T) signals provide critical information about contact dynamics, alignment, and interaction quality. Although various strategies have been proposed to integrate vision and F/T signals, including auxiliary prediction objectives, mixture-of-experts architectures, and contact-aware gating mechanisms, a comparison of these approaches remains lacking. In this work, we provide a comparison study of different F/T-vision integration strategies within diffusion-based manipulation policies. In addition, we propose an adaptive integration strategy that ignores F/T signals during non-contact phases while adaptively leveraging both vision and torque information during contact. Experimental results demonstrate that our method outperforms the strongest baseline by 14% in success rate, highlighting the importance of contact-aware multimodal fusion for robotic manipulation.

Problem

Research questions and friction points this paper is trying to address.

vision-torque fusion

contact-aware manipulation

robotic manipulation

multimodal sensing

force/torque sensing

Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive fusion

vision-torque integration

contact-aware manipulation