π€ AI Summary
This work addresses the challenge that humanoid robots often exhibit degraded task performance when imitating human demonstrations due to morphological discrepancies and retargeting errors, while pure reinforcement learning tends to yield unnatural motions. To this end, the authors propose TCMP, a task-prioritized adversarial imitation learning framework that innovatively treats imitation as a conditional regularizer rather than an equally weighted objective. By prioritizing task success, TCMP adaptively integrates motion priors through a geometrically aware policy update grounded in task-prioritized stationary point theory, effectively mitigating gradient conflicts and ensuring policy optimization along task-feasible directions. Experiments demonstrate that TCMP achieves both robust task performance and natural, consistent motion stylesβeven when trained on noisy, misaligned demonstration data.
π Abstract
Humanoid control often leverages motion priors from human demonstrations to encourage natural behaviors. However, such demonstrations are frequently suboptimal or misaligned with robotic tasks due to embodiment differences, retargeting errors, and task-irrelevant variations, causing na\"ive imitation to degrade task performance. Conversely, task-only reinforcement learning admits many task-optimal solutions, often resulting in unnatural or unstable motions. This exposes a fundamental limitation of linear reward mixing in adversarial imitation learning. We propose \emph{Task-Centric Motion Priors} (TCMP), a task-priority adversarial imitation framework that treats imitation as a conditional regularizer rather than a co-equal objective. TCMP maximizes task improvement while incorporating imitation signals only when they are compatible with task progress, yielding an adaptive, geometry-aware update that preserves task-feasible descent and suppresses harmful imitation under misalignment. We provide theoretical analysis of gradient conflict and task-priority stationary points, and validate our claims through humanoid control experiments demonstrating robust task performance with consistent motion style under noisy demonstrations.