Adaptive Visuo-Tactile Fusion with Predictive Force Attention for Dexterous Manipulation

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

To address the lack of phase adaptability in vision–tactile fusion for dexterous manipulation, this paper proposes a force-guided adaptive vision–tactile fusion framework. Without requiring manual annotations, it introduces a novel force-driven predictive attention mechanism that dynamically modulates modality weights to align with stage-specific perceptual demands; additionally, a self-supervised future force prediction task is designed to enhance tactile representation learning. Key contributions include: (1) the first force-driven temporally adaptive multimodal attention mechanism; (2) the first tactile-enhanced fusion architecture integrating self-supervised force prediction; and (3) autonomous evolution of operation-phase-aware modality weights. Evaluated on three high-contact, fine-grained manipulation tasks in laboratory settings, the framework achieves a mean success rate of 93%, demonstrating both effectiveness and rationality of the proposed approach.

Technology Category

Application Category

📝 Abstract

Effectively utilizing multi-sensory data is important for robots to generalize across diverse tasks. However, the heterogeneous nature of these modalities makes fusion challenging. Existing methods propose strategies to obtain comprehensively fused features but often ignore the fact that each modality requires different levels of attention at different manipulation stages. To address this, we propose a force-guided attention fusion module that adaptively adjusts the weights of visual and tactile features without human labeling. We also introduce a self-supervised future force prediction auxiliary task to reinforce the tactile modality, improve data imbalance, and encourage proper adjustment. Our method achieves an average success rate of 93% across three fine-grained, contactrich tasks in real-world experiments. Further analysis shows that our policy appropriately adjusts attention to each modality at different manipulation stages. The videos can be viewed at https://adaptac-dex.github.io/.

Problem

Research questions and friction points this paper is trying to address.

Adaptively fusing visual and tactile data for robot manipulation

Addressing modality attention imbalance in multi-sensory fusion

Improving dexterous manipulation via force-guided self-supervised learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Force-guided attention fusion module

Self-supervised future force prediction

Adaptive visual-tactile feature weighting

🔎 Similar Papers

Canonical Representation and Force-Based Pretraining of 3D Tactile for Dexterous Visuo-Tactile Policy Learning