DiWA: Diffusion Policy Adaptation with World Models

📅 2025-08-05

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Diffusion-based policies suffer from prohibitively high sample complexity and reward attribution challenges when fine-tuned on real robots, requiring extensive physical interaction and suffering from delayed or sparse reward feedback. Method: We propose DiWA—a model-based reinforcement learning framework for offline fine-tuning of diffusion policies using a pre-trained offline world model. DiWA uniquely formalizes the denoising process of diffusion policies as a Markov decision process and enables closed-loop policy optimization entirely in simulation, eliminating real-world interaction. Contribution/Results: By decoupling policy adaptation from physical execution, DiWA achieves performance gains across all 8 tasks in the CALVIN benchmark via pure offline adaptation. It reduces real-world interaction by 2–3 orders of magnitude compared to model-free RL baselines, significantly improving sample efficiency and deployment safety—establishing a new paradigm for efficient, low-risk skill optimization on real-world robotic platforms.

Technology Category

Application Category

📝 Abstract

Fine-tuning diffusion policies with reinforcement learning (RL) presents significant challenges. The long denoising sequence for each action prediction impedes effective reward propagation. Moreover, standard RL methods require millions of real-world interactions, posing a major bottleneck for practical fine-tuning. Although prior work frames the denoising process in diffusion policies as a Markov Decision Process to enable RL-based updates, its strong dependence on environment interaction remains highly inefficient. To bridge this gap, we introduce DiWA, a novel framework that leverages a world model for fine-tuning diffusion-based robotic skills entirely offline with reinforcement learning. Unlike model-free approaches that require millions of environment interactions to fine-tune a repertoire of robot skills, DiWA achieves effective adaptation using a world model trained once on a few hundred thousand offline play interactions. This results in dramatically improved sample efficiency, making the approach significantly more practical and safer for real-world robot learning. On the challenging CALVIN benchmark, DiWA improves performance across eight tasks using only offline adaptation, while requiring orders of magnitude fewer physical interactions than model-free baselines. To our knowledge, this is the first demonstration of fine-tuning diffusion policies for real-world robotic skills using an offline world model. We make the code publicly available at https://diwa.cs.uni-freiburg.de.

Problem

Research questions and friction points this paper is trying to address.

Fine-tuning diffusion policies with RL faces reward propagation challenges

Standard RL needs excessive real-world interactions for practical fine-tuning

Prior MDP-based methods remain inefficient due to environment dependence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Offline fine-tuning with world models

Reinforcement learning for diffusion policies

Sample-efficient robotic skill adaptation

🔎 Similar Papers

PWM: Policy Learning with Multi-Task World Models

2024-07-02Citations: 2

Bosch Group

Renningen, BW, DE

Master Thesis Reinforcement Learning for Behavior Planning in Automated Driving

Bosch Group

Renningen, BW, DE

Research Scientist Intern, Robotic Control Policy (PhD)