Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation

📅 2025-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address poor dynamic adaptability and response latency in contact-intensive dexterous manipulation—stemming from the absence of real-time tactile feedback in vision-based imitation learning and inadequate haptic feedback in teleoperation systems—this paper proposes a vision-tactile synergistic reactive policy learning framework. Our method integrates visual and tactile modalities for closed-loop, real-time control. Key contributions include: (1) a novel slow-fast hierarchical diffusion policy architecture, where a slow latent-variable diffusion model generates high-level action primitives, while a fast heterogeneous temporal tokenizer enables millisecond-scale tactile闭环; and (2) TactAR, a low-cost augmented reality–enhanced haptic teleoperation system. Evaluated on three challenging contact-rich manipulation tasks, our approach significantly outperforms existing vision-only imitation learning baselines, reducing tactile response latency by 62%. It supports diverse tactile/force sensors, demonstrating strong generalizability and practical deployability.

Technology Category

Application Category

📝 Abstract
Humans can accomplish complex contact-rich tasks using vision and touch, with highly reactive capabilities such as quick adjustments to environmental changes and adaptive control of contact forces; however, this remains challenging for robots. Existing visual imitation learning (IL) approaches rely on action chunking to model complex behaviors, which lacks the ability to respond instantly to real-time tactile feedback during the chunk execution. Furthermore, most teleoperation systems struggle to provide fine-grained tactile / force feedback, which limits the range of tasks that can be performed. To address these challenges, we introduce TactAR, a low-cost teleoperation system that provides real-time tactile feedback through Augmented Reality (AR), along with Reactive Diffusion Policy (RDP), a novel slow-fast visual-tactile imitation learning algorithm for learning contact-rich manipulation skills. RDP employs a two-level hierarchy: (1) a slow latent diffusion policy for predicting high-level action chunks in latent space at low frequency, (2) a fast asymmetric tokenizer for closed-loop tactile feedback control at high frequency. This design enables both complex trajectory modeling and quick reactive behavior within a unified framework. Through extensive evaluation across three challenging contact-rich tasks, RDP significantly improves performance compared to state-of-the-art visual IL baselines through rapid response to tactile / force feedback. Furthermore, experiments show that RDP is applicable across different tactile / force sensors. Code and videos are available on https://reactive-diffusion-policy.github.io/.
Problem

Research questions and friction points this paper is trying to address.

Robots lack reactive capabilities for contact-rich tasks.
Existing visual imitation learning lacks real-time tactile feedback.
Teleoperation systems struggle with fine-grained tactile feedback.
Innovation

Methods, ideas, or system contributions that make the work stand out.

TactAR system provides real-time tactile AR feedback.
Reactive Diffusion Policy uses slow-fast learning hierarchy.
RDP enables rapid tactile feedback response in manipulation.
🔎 Similar Papers
No similar papers found.
H
Han Xue
Shanghai Jiao Tong University
J
Jieji Ren
Shanghai Jiao Tong University
Wendi Chen
Wendi Chen
Ph.D. Student, Shanghai Jiao Tong University
Robot LearningEmbodied AIMachine Learning
Gu Zhang
Gu Zhang
Tsinghua University
RoboticsRobot Learning
Y
Yuan Fang
Shanghai Jiao Tong University
G
Guoying Gu
Shanghai Jiao Tong University
Huazhe Xu
Huazhe Xu
Tsinghua University
Embodied AIReinforcement LearningComputer VisionDeep Learning
C
Cewu Lu
Shanghai Jiao Tong University