RobotDancing: Residual-Action Reinforcement Learning Enables Robust Long-Horizon Humanoid Motion Tracking

📅 2025-09-24

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

Humanoid robots exhibit insufficient robustness in long-duration, highly dynamic motion tracking (e.g., jumping, spinning, cartwheels) due to accumulated joint command errors stemming from model–reality discrepancies. To address this, we propose a residual-action reinforcement learning framework: it employs end-to-end, single-stage RL to jointly model observations, rewards, and hyperparameters; introduces a novel residual joint target prediction mechanism that explicitly encodes model mismatch within the action space; and supports both sim-to-sim validation and sim-to-real zero-shot transfer. Experiments on Unitree G1 and H1/H1-2 platforms demonstrate that our method accurately reproduces multi-minute, high-energy dance sequences. Crucially, zero-shot deployment on physical robots achieves excellent motion fidelity without fine-tuning, significantly improving long-horizon tracking robustness and cross-platform generalization for dynamic locomotion.

Technology Category

Application Category

📝 Abstract

Long-horizon, high-dynamic motion tracking on humanoids remains brittle because absolute joint commands cannot compensate model-plant mismatch, leading to error accumulation. We propose RobotDancing, a simple, scalable framework that predicts residual joint targets to explicitly correct dynamics discrepancies. The pipeline is end-to-end--training, sim-to-sim validation, and zero-shot sim-to-real--and uses a single-stage reinforcement learning (RL) setup with a unified observation, reward, and hyperparameter configuration. We evaluate primarily on Unitree G1 with retargeted LAFAN1 dance sequences and validate transfer on H1/H1-2. RobotDancing can track multi-minute, high-energy behaviors (jumps, spins, cartwheels) and deploys zero-shot to hardware with high motion tracking quality.

Problem

Research questions and friction points this paper is trying to address.

Addresses error accumulation in humanoid motion tracking

Corrects dynamics discrepancies using residual joint targets

Enables robust long-horizon motion tracking on hardware

Innovation

Methods, ideas, or system contributions that make the work stand out.

Predicts residual joint targets dynamically

Uses single-stage reinforcement learning setup

Enables zero-shot sim-to-real deployment

🔎 Similar Papers

I-CTRL: Imitation to Control Humanoid Robots Through Constrained Reinforcement Learning