Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation

📅 2025-03-04

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

To address poor dynamic adaptability and response latency in contact-intensive dexterous manipulation—stemming from the absence of real-time tactile feedback in vision-based imitation learning and inadequate haptic feedback in teleoperation systems—this paper proposes a vision-tactile synergistic reactive policy learning framework. Our method integrates visual and tactile modalities for closed-loop, real-time control. Key contributions include: (1) a novel slow-fast hierarchical diffusion policy architecture, where a slow latent-variable diffusion model generates high-level action primitives, while a fast heterogeneous temporal tokenizer enables millisecond-scale tactile闭环; and (2) TactAR, a low-cost augmented reality–enhanced haptic teleoperation system. Evaluated on three challenging contact-rich manipulation tasks, our approach significantly outperforms existing vision-only imitation learning baselines, reducing tactile response latency by 62%. It supports diverse tactile/force sensors, demonstrating strong generalizability and practical deployability.

Technology Category

Application Category

📝 Abstract

Humans can accomplish complex contact-rich tasks using vision and touch, with highly reactive capabilities such as quick adjustments to environmental changes and adaptive control of contact forces; however, this remains challenging for robots. Existing visual imitation learning (IL) approaches rely on action chunking to model complex behaviors, which lacks the ability to respond instantly to real-time tactile feedback during the chunk execution. Furthermore, most teleoperation systems struggle to provide fine-grained tactile / force feedback, which limits the range of tasks that can be performed. To address these challenges, we introduce TactAR, a low-cost teleoperation system that provides real-time tactile feedback through Augmented Reality (AR), along with Reactive Diffusion Policy (RDP), a novel slow-fast visual-tactile imitation learning algorithm for learning contact-rich manipulation skills. RDP employs a two-level hierarchy: (1) a slow latent diffusion policy for predicting high-level action chunks in latent space at low frequency, (2) a fast asymmetric tokenizer for closed-loop tactile feedback control at high frequency. This design enables both complex trajectory modeling and quick reactive behavior within a unified framework. Through extensive evaluation across three challenging contact-rich tasks, RDP significantly improves performance compared to state-of-the-art visual IL baselines through rapid response to tactile / force feedback. Furthermore, experiments show that RDP is applicable across different tactile / force sensors. Code and videos are available on https://reactive-diffusion-policy.github.io/.

Problem

Research questions and friction points this paper is trying to address.

Robots lack reactive capabilities for contact-rich tasks.

Existing visual imitation learning lacks real-time tactile feedback.

Teleoperation systems struggle with fine-grained tactile feedback.

Innovation

Methods, ideas, or system contributions that make the work stand out.

TactAR system provides real-time tactile AR feedback.

Reactive Diffusion Policy uses slow-fast learning hierarchy.

RDP enables rapid tactile feedback response in manipulation.

🔎 Similar Papers

MimicTouch: Leveraging Multi-modal Human Tactile Demonstrations for Contact-rich Manipulation