Task-Centric Policy Optimization from Misaligned Motion Priors

📅 2026-01-27

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This work addresses the challenge that humanoid robots often exhibit degraded task performance when imitating human demonstrations due to morphological discrepancies and retargeting errors, while pure reinforcement learning tends to yield unnatural motions. To this end, the authors propose TCMP, a task-prioritized adversarial imitation learning framework that innovatively treats imitation as a conditional regularizer rather than an equally weighted objective. By prioritizing task success, TCMP adaptively integrates motion priors through a geometrically aware policy update grounded in task-prioritized stationary point theory, effectively mitigating gradient conflicts and ensuring policy optimization along task-feasible directions. Experiments demonstrate that TCMP achieves both robust task performance and natural, consistent motion styles—even when trained on noisy, misaligned demonstration data.

Technology Category

Application Category

📝 Abstract

Humanoid control often leverages motion priors from human demonstrations to encourage natural behaviors. However, such demonstrations are frequently suboptimal or misaligned with robotic tasks due to embodiment differences, retargeting errors, and task-irrelevant variations, causing na\"ive imitation to degrade task performance. Conversely, task-only reinforcement learning admits many task-optimal solutions, often resulting in unnatural or unstable motions. This exposes a fundamental limitation of linear reward mixing in adversarial imitation learning. We propose \emph{Task-Centric Motion Priors} (TCMP), a task-priority adversarial imitation framework that treats imitation as a conditional regularizer rather than a co-equal objective. TCMP maximizes task improvement while incorporating imitation signals only when they are compatible with task progress, yielding an adaptive, geometry-aware update that preserves task-feasible descent and suppresses harmful imitation under misalignment. We provide theoretical analysis of gradient conflict and task-priority stationary points, and validate our claims through humanoid control experiments demonstrating robust task performance with consistent motion style under noisy demonstrations.

Problem

Research questions and friction points this paper is trying to address.

humanoid control

motion priors

imitation learning

task misalignment

reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Task-Centric Motion Priors

Adversarial Imitation Learning

Gradient Conflict

Conditional Regularization

Humanoid Control

🔎 Similar Papers

No similar papers found.