Task-Centric Policy Optimization from Misaligned Motion Priors

πŸ“… 2026-01-27
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge that humanoid robots often exhibit degraded task performance when imitating human demonstrations due to morphological discrepancies and retargeting errors, while pure reinforcement learning tends to yield unnatural motions. To this end, the authors propose TCMP, a task-prioritized adversarial imitation learning framework that innovatively treats imitation as a conditional regularizer rather than an equally weighted objective. By prioritizing task success, TCMP adaptively integrates motion priors through a geometrically aware policy update grounded in task-prioritized stationary point theory, effectively mitigating gradient conflicts and ensuring policy optimization along task-feasible directions. Experiments demonstrate that TCMP achieves both robust task performance and natural, consistent motion stylesβ€”even when trained on noisy, misaligned demonstration data.

Technology Category

Application Category

πŸ“ Abstract
Humanoid control often leverages motion priors from human demonstrations to encourage natural behaviors. However, such demonstrations are frequently suboptimal or misaligned with robotic tasks due to embodiment differences, retargeting errors, and task-irrelevant variations, causing na\"ive imitation to degrade task performance. Conversely, task-only reinforcement learning admits many task-optimal solutions, often resulting in unnatural or unstable motions. This exposes a fundamental limitation of linear reward mixing in adversarial imitation learning. We propose \emph{Task-Centric Motion Priors} (TCMP), a task-priority adversarial imitation framework that treats imitation as a conditional regularizer rather than a co-equal objective. TCMP maximizes task improvement while incorporating imitation signals only when they are compatible with task progress, yielding an adaptive, geometry-aware update that preserves task-feasible descent and suppresses harmful imitation under misalignment. We provide theoretical analysis of gradient conflict and task-priority stationary points, and validate our claims through humanoid control experiments demonstrating robust task performance with consistent motion style under noisy demonstrations.
Problem

Research questions and friction points this paper is trying to address.

humanoid control
motion priors
imitation learning
task misalignment
reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Task-Centric Motion Priors
Adversarial Imitation Learning
Gradient Conflict
Conditional Regularization
Humanoid Control
πŸ”Ž Similar Papers
No similar papers found.
Ziang Zheng
Ziang Zheng
Master, Tsinghua University
Reinforcement LearningRoboticsAnimationFederated Learning
Kai Feng
Kai Feng
Northwestern Polytechnical University
Computational imagingspectral imagingdeep learning
Y
Yi Nie
The Department of Automation, Tsinghua University, Beijing, China
S
Shentao Qin
School of Vehicle and Mobility, Tsinghua University, Beijing, China