🤖 AI Summary
Humanoid robots exhibit insufficient robustness in long-duration, highly dynamic motion tracking (e.g., jumping, spinning, cartwheels) due to accumulated joint command errors stemming from model–reality discrepancies. To address this, we propose a residual-action reinforcement learning framework: it employs end-to-end, single-stage RL to jointly model observations, rewards, and hyperparameters; introduces a novel residual joint target prediction mechanism that explicitly encodes model mismatch within the action space; and supports both sim-to-sim validation and sim-to-real zero-shot transfer. Experiments on Unitree G1 and H1/H1-2 platforms demonstrate that our method accurately reproduces multi-minute, high-energy dance sequences. Crucially, zero-shot deployment on physical robots achieves excellent motion fidelity without fine-tuning, significantly improving long-horizon tracking robustness and cross-platform generalization for dynamic locomotion.
📝 Abstract
Long-horizon, high-dynamic motion tracking on humanoids remains brittle because absolute joint commands cannot compensate model-plant mismatch, leading to error accumulation. We propose RobotDancing, a simple, scalable framework that predicts residual joint targets to explicitly correct dynamics discrepancies. The pipeline is end-to-end--training, sim-to-sim validation, and zero-shot sim-to-real--and uses a single-stage reinforcement learning (RL) setup with a unified observation, reward, and hyperparameter configuration. We evaluate primarily on Unitree G1 with retargeted LAFAN1 dance sequences and validate transfer on H1/H1-2. RobotDancing can track multi-minute, high-energy behaviors (jumps, spins, cartwheels) and deploys zero-shot to hardware with high motion tracking quality.