GelFusion: Enhancing Robotic Manipulation under Visual Constraints via Visuotactile Fusion

📅 2025-05-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the poor robustness of robotic manipulation under visually degraded conditions (e.g., occlusion, blur) and the performance bottlenecks of imitation learning, this paper proposes a vision–tactile fusion policy learning framework. Methodologically, it introduces (1) a dual-channel GelSight tactile representation encoding both texture-geometry and dynamic interaction features, and (2) a vision-dominant cross-modal cross-attention fusion mechanism that enables tactile signals to compensate for visual deficiencies. Evaluated on surface wiping, peg insertion, and fragile object grasping-and-placing tasks, the framework achieves success rates of 92.3%–96.7%, outperforming the best baseline by an average of 11.5%. These results demonstrate substantial mitigation of visual uncertainty and validate the framework’s tactile-guided cross-modal generalization capability.

Technology Category

Application Category

📝 Abstract
Visuotactile sensing offers rich contact information that can help mitigate performance bottlenecks in imitation learning, particularly under vision-limited conditions, such as ambiguous visual cues or occlusions. Effectively fusing visual and visuotactile modalities, however, presents ongoing challenges. We introduce GelFusion, a framework designed to enhance policies by integrating visuotactile feedback, specifically from high-resolution GelSight sensors. GelFusion using a vision-dominated cross-attention fusion mechanism incorporates visuotactile information into policy learning. To better provide rich contact information, the framework's core component is our dual-channel visuotactile feature representation, simultaneously leveraging both texture-geometric and dynamic interaction features. We evaluated GelFusion on three contact-rich tasks: surface wiping, peg insertion, and fragile object pick-and-place. Outperforming baselines, GelFusion shows the value of its structure in improving the success rate of policy learning.
Problem

Research questions and friction points this paper is trying to address.

Enhance robotic manipulation under vision-limited conditions
Fuse visual and visuotactile sensing effectively
Improve policy learning success in contact-rich tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fuses visual and visuotactile feedback via cross-attention
Uses dual-channel texture-geometric and dynamic features
Integrates high-resolution GelSight sensors for contact data
🔎 Similar Papers
No similar papers found.