Opt2Skill: Imitating Dynamically-feasible Whole-Body Trajectories for Versatile Humanoid Loco-Manipulation

📅 2024-09-30
🏛️ arXiv.org
📈 Citations: 11
Influential: 0
📄 PDF
🤖 AI Summary
Humanoid robots face a fundamental trade-off between control accuracy and robustness in locomanipulation tasks, stemming from high-dimensional, unstable dynamics and complex multi-contact interactions. To address this, we propose a hybrid paradigm integrating model-driven trajectory optimization with end-to-end reinforcement learning: torque-constrained differential dynamic programming (DDP) generates physically feasible, natural whole-body reference trajectories; these are then refined via PPO or SAC for policy fine-tuning and robustness enhancement. We further develop a full-body rigid-body dynamics model and embed a real-time trajectory tracking controller. Evaluated on the Digit robot, our approach achieves 2.3× faster training convergence, high-fidelity trajectory tracking, and successful real-world deployment—demonstrating robust locomanipulation capabilities including walking-while-grasping and obstacle negotiation across diverse scenarios. The method significantly improves sim-to-real transfer performance and enhances motion naturalness.

Technology Category

Application Category

📝 Abstract
Humanoid robots are designed to perform diverse loco-manipulation tasks. However, they face challenges due to their high-dimensional and unstable dynamics, as well as the complex contact-rich nature of the tasks. Model-based optimal control methods offer precise and systematic control but are limited by high computational complexity and accurate contact sensing. On the other hand, reinforcement learning (RL) provides robustness and handles high-dimensional spaces but suffers from inefficient learning, unnatural motion, and sim-to-real gaps. To address these challenges, we introduce Opt2Skill, an end-to-end pipeline that combines model-based trajectory optimization with RL to achieve robust whole-body loco-manipulation. We generate reference motions for the Digit humanoid robot using differential dynamic programming (DDP) and train RL policies to track these trajectories. Our results demonstrate that Opt2Skill outperforms pure RL methods in both training efficiency and task performance, with optimal trajectories that account for torque limits enhancing trajectory tracking. We successfully transfer our approach to real-world applications.
Problem

Research questions and friction points this paper is trying to address.

High-dimensional unstable dynamics in humanoid robots
Complex contact-rich loco-manipulation task challenges
Trade-offs between model-based control and reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines trajectory optimization with reinforcement learning
Uses DDP for dynamic feasible reference motions
Improves contact force tracking with torque information
🔎 Similar Papers
No similar papers found.
Fukang Liu
Fukang Liu
Institute of Science Tokyo
cryptography
Zhaoyuan Gu
Zhaoyuan Gu
Georgia Tech
Humanoid RoboticsArtificial Intelligence
Yilin Cai
Yilin Cai
Institute for Robotics and Intelligent Machines, Georgia Institute of Technology, Atlanta, GA, USA
Z
Ziyi Zhou
Institute for Robotics and Intelligent Machines, Georgia Institute of Technology, Atlanta, GA, USA
S
Shijie Zhao
Institute for Robotics and Intelligent Machines, Georgia Institute of Technology, Atlanta, GA, USA
Hyunyoung Jung
Hyunyoung Jung
Meta
deep learningcomputer vision
Sehoon Ha
Sehoon Ha
Georgia Institute of Technology
roboticscomputer graphicsmachine learning
Y
Yue Chen
Institute for Robotics and Intelligent Machines, Georgia Institute of Technology, Atlanta, GA, USA
Danfei Xu
Danfei Xu
Assistant Professor at School of Interactive Computing
Robot LearningComputer Vision
Y
Ye Zhao
Institute for Robotics and Intelligent Machines, Georgia Institute of Technology, Atlanta, GA, USA