RobotDancing: Residual-Action Reinforcement Learning Enables Robust Long-Horizon Humanoid Motion Tracking

📅 2025-09-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Humanoid robots exhibit insufficient robustness in long-duration, highly dynamic motion tracking (e.g., jumping, spinning, cartwheels) due to accumulated joint command errors stemming from model–reality discrepancies. To address this, we propose a residual-action reinforcement learning framework: it employs end-to-end, single-stage RL to jointly model observations, rewards, and hyperparameters; introduces a novel residual joint target prediction mechanism that explicitly encodes model mismatch within the action space; and supports both sim-to-sim validation and sim-to-real zero-shot transfer. Experiments on Unitree G1 and H1/H1-2 platforms demonstrate that our method accurately reproduces multi-minute, high-energy dance sequences. Crucially, zero-shot deployment on physical robots achieves excellent motion fidelity without fine-tuning, significantly improving long-horizon tracking robustness and cross-platform generalization for dynamic locomotion.

Technology Category

Application Category

📝 Abstract
Long-horizon, high-dynamic motion tracking on humanoids remains brittle because absolute joint commands cannot compensate model-plant mismatch, leading to error accumulation. We propose RobotDancing, a simple, scalable framework that predicts residual joint targets to explicitly correct dynamics discrepancies. The pipeline is end-to-end--training, sim-to-sim validation, and zero-shot sim-to-real--and uses a single-stage reinforcement learning (RL) setup with a unified observation, reward, and hyperparameter configuration. We evaluate primarily on Unitree G1 with retargeted LAFAN1 dance sequences and validate transfer on H1/H1-2. RobotDancing can track multi-minute, high-energy behaviors (jumps, spins, cartwheels) and deploys zero-shot to hardware with high motion tracking quality.
Problem

Research questions and friction points this paper is trying to address.

Addresses error accumulation in humanoid motion tracking
Corrects dynamics discrepancies using residual joint targets
Enables robust long-horizon motion tracking on hardware
Innovation

Methods, ideas, or system contributions that make the work stand out.

Predicts residual joint targets dynamically
Uses single-stage reinforcement learning setup
Enables zero-shot sim-to-real deployment
🔎 Similar Papers
No similar papers found.
Z
Zhenguo Sun
School of Computation, Information and Technology, Technical University of Munich, 85748 Munich, Germany
Yibo Peng
Yibo Peng
Carnegie Mellon University
Code GenerationMultimodal NLPAI Agents
Y
Yuan Meng
School of Computation, Information and Technology, Technical University of Munich, 85748 Munich, Germany
Xukun Li
Xukun Li
Kansas State University
computer visionmachine learningdeep learningstatistical modeling
B
Bo-Sheng Huang
Department of Computer Science, Tsinghua University, 100084 Beijing, China
Zhenshan Bing
Zhenshan Bing
Nanjing University / Technical University of Munich
Robotics
X
Xinlong Wang
Beijing Academy of Artificial Intelligence, 100084 Beijing, China
Alois Knoll
Alois Knoll
Technische Universität München
RoboticsAISensor Data FusionAutonomous DrivingCyber Physical Systems