FDPP: Fine-tune Diffusion Policy with Human Preference

📅 2025-01-14

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

In robot imitation learning, pre-trained policies often fail to adapt to new user preferences without degrading original task performance. This paper proposes a novel human-preference fine-tuning framework for diffusion-based policies. First, a differentiable reward function is learned from pairwise preference comparisons; then, the diffusion policy is efficiently fine-tuned via KL-regularized reinforcement learning (PPO or SAC), where the KL divergence constrains updates to preserve task performance. This work pioneers the integration of preference learning with diffusion policy adaptation and introduces task-preserving KL regularization to jointly achieve fine-grained behavioral alignment and policy stability. Evaluated across diverse robotic manipulation tasks, the method maintains task success rates above 92%, improves preference alignment accuracy by 37%, and significantly mitigates overfitting to preference data.

Technology Category

Application Category

📝 Abstract

Imitation learning from human demonstrations enables robots to perform complex manipulation tasks and has recently witnessed huge success. However, these techniques often struggle to adapt behavior to new preferences or changes in the environment. To address these limitations, we propose Fine-tuning Diffusion Policy with Human Preference (FDPP). FDPP learns a reward function through preference-based learning. This reward is then used to fine-tune the pre-trained policy with reinforcement learning (RL), resulting in alignment of pre-trained policy with new human preferences while still solving the original task. Our experiments across various robotic tasks and preferences demonstrate that FDPP effectively customizes policy behavior without compromising performance. Additionally, we show that incorporating Kullback-Leibler (KL) regularization during fine-tuning prevents over-fitting and helps maintain the competencies of the initial policy.

Problem

Research questions and friction points this paper is trying to address.

Robot Adaptability

Preference Learning

Environmental Changes

Innovation

Methods, ideas, or system contributions that make the work stand out.

FDPP

KL Regularization

Preference Adaptation

🔎 Similar Papers

No similar papers found.

OpenAI

$380K – $445K • Offers Equity

San Francisco, CA, USA

Research Scientist Intern, Robotic Control Policy (PhD)