FBI: Learning Dexterous In-hand Manipulation with Dynamic Visuotactile Shortcut Policy

📅 2025-08-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Dexterous in-hand manipulation remains challenging due to complex contact dynamics and partial state observability. Existing approaches are predominantly unimodal, failing to synergistically integrate vision and touch for dynamic interaction. This paper proposes FBI (Flow Before Imitation), a novel framework introducing a dynamics-aware latent model that explicitly captures the causal relationship between tactile signals and object motion—overcoming limitations of conventional static multimodal fusion. FBI jointly leverages optical-flow-driven tactile dynamic features and Transformer-based multimodal encoding, and employs a single-step diffusion policy for low-latency real-time control. Evaluated on five dexterous manipulation tasks—including two custom-designed and three standard benchmarks—in both simulation and real-world platforms, FBI consistently outperforms state-of-the-art baselines, demonstrating superior generalization and robustness under dynamic contact-rich scenarios.

Technology Category

Application Category

📝 Abstract
Dexterous in-hand manipulation is a long-standing challenge in robotics due to complex contact dynamics and partial observability. While humans synergize vision and touch for such tasks, robotic approaches often prioritize one modality, therefore limiting adaptability. This paper introduces Flow Before Imitation (FBI), a visuotactile imitation learning framework that dynamically fuses tactile interactions with visual observations through motion dynamics. Unlike prior static fusion methods, FBI establishes a causal link between tactile signals and object motion via a dynamics-aware latent model. FBI employs a transformer-based interaction module to fuse flow-derived tactile features with visual inputs, training a one-step diffusion policy for real-time execution. Extensive experiments demonstrate that the proposed method outperforms the baseline methods in both simulation and the real world on two customized in-hand manipulation tasks and three standard dexterous manipulation tasks. Code, models, and more results are available in the website https://sites.google.com/view/dex-fbi.
Problem

Research questions and friction points this paper is trying to address.

Dexterous in-hand manipulation with complex contact dynamics
Partial observability in robotic manipulation tasks
Dynamic fusion of vision and touch modalities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic visuotactile fusion through motion dynamics
Transformer-based interaction module for feature fusion
One-step diffusion policy for real-time execution
🔎 Similar Papers
No similar papers found.