REFINE-DP: Diffusion Policy Fine-tuning for Humanoid Loco-manipulation via Reinforcement Learning

📅 2026-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Humanoid robots often suffer from poor command tracking, compounding distributional shift, and task failure in loco-manipulation due to the disconnect between high-level planning and low-level control. This work proposes REFINE-DP, a novel framework that, for the first time, employs PPO-based diffusion policy gradients to jointly optimize a high-level diffusion planner and a low-level controller within an end-to-end coordinated training scheme, effectively mitigating distribution mismatch. The approach achieves over 90% task success rate in simulation—including on out-of-distribution scenarios—and demonstrates fluent autonomous execution in real-world dynamic environments. It significantly outperforms pretrained baselines, validating substantial improvements in both motion quality and task robustness.

Technology Category

Application Category

📝 Abstract
Humanoid loco-manipulation requires coordinated high-level motion plans with stable, low-level whole-body execution under complex robot-environment dynamics and long-horizon tasks. While diffusion policies (DPs) show promise for learning from demonstrations, deploying them on humanoids poses critical challenges: the motion planner trained offline is decoupled from the low-level controller, leading to poor command tracking, compounding distribution shift, and task failures. The common approach of scaling demonstration data is prohibitively expensive for high-dimensional humanoid systems. To address this challenge, we present REFINE-DP (REinforcement learning FINE-tuning of Diffusion Policy), a hierarchical framework that jointly optimizes a DP high-level planner and an RL-based low-level loco-manipulation controller. The DP is fine-tuned via a PPO-based diffusion policy gradient to improve task success rate, while the controller is simultaneously updated to accurately track the planner's evolving command distribution, reducing the distributional mismatch that degrades motion quality. We validate REFINE-DP on a humanoid robot performing loco-manipulation tasks, including door traversal and long-horizon object transport. REFINE-DP achieves an over $90\%$ success rate in simulation, even in out-of-distribution cases not seen in the pre-trained data, and enables smooth autonomous task execution in real-world dynamic environments. Our proposed method substantially outperforms pre-trained DP baselines and demonstrates that RL fine-tuning is key to reliable humanoid loco-manipulation. https://refine-dp.github.io/REFINE-DP/
Problem

Research questions and friction points this paper is trying to address.

humanoid loco-manipulation
diffusion policy
distribution shift
hierarchical control
command tracking
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion Policy
Reinforcement Learning Fine-tuning
Humanoid Loco-manipulation
Hierarchical Control
Distribution Shift Reduction
🔎 Similar Papers
No similar papers found.
Zhaoyuan Gu
Zhaoyuan Gu
Georgia Tech
Humanoid RoboticsArtificial Intelligence
Y
Yipu Chen
The Institute for Robotics and Intelligent Machines, Georgia Institute of Technology
Z
Zimeng Chai
The Institute for Robotics and Intelligent Machines, Georgia Institute of Technology
A
Alfred Cueva
The Institute for Robotics and Intelligent Machines, Georgia Institute of Technology
T
Thong Nguyen
The Institute for Robotics and Intelligent Machines, Georgia Institute of Technology
Y
Yifan Wu
The Institute for Robotics and Intelligent Machines, Georgia Institute of Technology
H
Huishu Xue
The Institute for Robotics and Intelligent Machines, Georgia Institute of Technology
M
Minji Kim
The Institute for Robotics and Intelligent Machines, Georgia Institute of Technology
I
Isaac Legene
The Institute for Robotics and Intelligent Machines, Georgia Institute of Technology
Fukang Liu
Fukang Liu
Institute of Science Tokyo
cryptography
Matthew Kim
Matthew Kim
Computer Science (B.S), UC San Diego
robot learningmulti-agent gamescontrol theory
A
Ayan Barula
The Institute for Robotics and Intelligent Machines, Georgia Institute of Technology
Yongxin Chen
Yongxin Chen
Georgia Institute of Technology
control theorymachine learningroboticsoptimal transportoptimization
Ye Zhao
Ye Zhao
Associate Professor, Mechanical Engineering, Georgia Tech
RoboticsFormal MethodsOptimizationTask and Motion PlanningHuman-robot Teaming