Retrieve-Augmented Generation for Speeding up Diffusion Policy without Additional Training

📅 2025-07-28

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Diffusion policies (DPs) suffer from high inference latency due to their iterative denoising process, while existing knowledge distillation methods—such as consistency policies (CP)—reduce latency only at the cost of extensive retraining. To address this, we propose RAGDP, the first framework to integrate retrieval-augmented generation (RAG) into DP acceleration. RAGDP constructs an observation-action vector database from expert demonstrations and dynamically retrieves semantically similar actions during intermediate denoising steps to guide noise prediction—requiring no additional training. By tightly coupling retrieval with the diffusion process in a step-aware manner, RAGDP achieves efficient, zero-cost acceleration. Experiments show that RAGDP reduces inference latency by 20× compared to CP while improving accuracy by 7%. Its core contribution lies in enabling high-fidelity policy execution with negligible computational overhead, bridging the gap between diffusion-based representation learning and real-time robotic control.

Technology Category

Application Category

📝 Abstract

Diffusion Policies (DPs) have attracted attention for their ability to achieve significant accuracy improvements in various imitation learning tasks. However, DPs depend on Diffusion Models, which require multiple noise removal steps to generate a single action, resulting in long generation times. To solve this problem, knowledge distillation-based methods such as Consistency Policy (CP) have been proposed. However, these methods require a significant amount of training time, especially for difficult tasks. In this study, we propose RAGDP (Retrieve-Augmented Generation for Diffusion Policies) as a novel framework that eliminates the need for additional training using a knowledge base to expedite the inference of pre-trained DPs. In concrete, RAGDP encodes observation-action pairs through the DP encoder to construct a vector database of expert demonstrations. During inference, the current observation is embedded, and the most similar expert action is extracted. This extracted action is combined with an intermediate noise removal step to reduce the number of steps required compared to the original diffusion step. We show that by using RAGDP with the base model and existing acceleration methods, we improve the accuracy and speed trade-off with no additional training. Even when accelerating the models 20 times, RAGDP maintains an advantage in accuracy, with a 7% increase over distillation models such as CP.

Problem

Research questions and friction points this paper is trying to address.

Speed up Diffusion Policy without extra training

Reduce action generation time in imitation learning

Improve accuracy-speed trade-off in Diffusion Policies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieve-Augmented Generation speeds up Diffusion Policies

Uses pre-trained knowledge base without extra training

Combines expert actions with noise removal steps

🔎 Similar Papers

Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion