Integrating Diffusion-based Multi-task Learning with Online Reinforcement Learning for Robust Quadruped Robot Control

📅 2025-07-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address unstable task switching under limited data, large cumulative errors in imitation learning, and insufficient robustness in online control for quadrupedal robots, this paper proposes DMLoco: a novel framework that pioneers the integration of language-conditioned diffusion models for multi-task policy pretraining, coupled with online PPO-based fine-tuning to enable end-to-end, language-guided robust locomotion control. The method employs DDIM-accelerated sampling and TensorRT-optimized deployment, enabling real-time execution at 50 Hz on embedded hardware. Experiments demonstrate significant improvements in sample efficiency and cross-task generalization. DMLoco achieves multi-skill generation, smooth task transitions, and low-latency online adaptation—validated in both simulation and on real quadruped platforms. This work establishes a new paradigm for embodied intelligence control on resource-constrained systems, offering enhanced robustness, scalability, and practical deployability.

Technology Category

Application Category

📝 Abstract
Recent research has highlighted the powerful capabilities of imitation learning in robotics. Leveraging generative models, particularly diffusion models, these approaches offer notable advantages such as strong multi-task generalization, effective language conditioning, and high sample efficiency. While their application has been successful in manipulation tasks, their use in legged locomotion remains relatively underexplored, mainly due to compounding errors that affect stability and difficulties in task transition under limited data. Online reinforcement learning (RL) has demonstrated promising results in legged robot control in the past years, providing valuable insights to address these challenges. In this work, we propose DMLoco, a diffusion-based framework for quadruped robots that integrates multi-task pretraining with online PPO finetuning to enable language-conditioned control and robust task transitions. Our approach first pretrains the policy on a diverse multi-task dataset using diffusion models, enabling language-guided execution of various skills. Then, it finetunes the policy in simulation to ensure robustness and stable task transition during real-world deployment. By utilizing Denoising Diffusion Implicit Models (DDIM) for efficient sampling and TensorRT for optimized deployment, our policy runs onboard at 50Hz, offering a scalable and efficient solution for adaptive, language-guided locomotion on resource-constrained robotic platforms.
Problem

Research questions and friction points this paper is trying to address.

Integrating diffusion models for robust quadruped robot control
Addressing stability and task transition in legged locomotion
Enabling language-conditioned control with efficient onboard execution
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion-based multi-task pretraining for diverse skills
Online PPO finetuning for robust task transitions
DDIM and TensorRT for efficient real-time execution
🔎 Similar Papers
No similar papers found.
X
Xinyao Qin
Department of Automation, Tsinghua University
X
Xiaoteng Ma
Department of Automation, Tsinghua University
Yang Qi
Yang Qi
Fudan University, Shanghai, China
Computational Neuroscience
Qihan Liu
Qihan Liu
Tsinghua University
C
Chuanyi Xue
Department of Automation, Tsinghua University
N
Ning Gui
Department of Automation, Tsinghua University
Q
Qinyu Dong
Department of Automation, Tsinghua University
J
Jun Yang
Department of Automation, Tsinghua University
B
Bin Liang
Department of Automation, Tsinghua University