Benchmarking Humanoid Imitation Learning with Motion Difficulty

📅 2025-12-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing imitation learning evaluation metrics—such as joint position error—reflect only policy performance and fail to disentangle intrinsic motion difficulty, hindering discrimination between failures due to insufficient learning capacity versus inherent action complexity. Method: We propose the Motion Difficulty Score (MDS), the first metric grounded in rigid-body dynamics, quantifying control difficulty via spatial volume, variance, and temporal variability of torque responses under pose perturbations. We further define Motion Intrinsic Difficulty (MID) and Dynamic Stability Joint Effort (DSJE) to construct the difficulty-aware MD-AMASS dataset. Contribution/Results: Experiments demonstrate that MDS effectively explains performance disparities across mainstream imitation learning methods on diverse tasks, exhibiting strong interpretability and cross-task generalizability. This work establishes the first systematic, quantitative framework for assessing motion difficulty in imitation learning.

Technology Category

Application Category

📝 Abstract
Physics-based motion imitation is central to humanoid control, yet current evaluation metrics (e.g., joint position error) only measure how well a policy imitates but not how difficult the motion itself is. This conflates policy performance with motion difficulty, obscuring whether failures stem from poor learning or inherently challenging motions. In this work, we address this gap with Motion Difficulty Score (MDS), a novel metric that defines and quantifies imitation difficulty independent of policy performance. Grounded in rigid-body dynamics, MDS interprets difficulty as the torque variation induced by small pose perturbations: larger torque-to-pose variation yields flatter reward landscapes and thus higher learning difficulty. MDS captures this through three properties of the perturbation-induced torque space: volume, variance, and temporal variability. We also use it to construct MD-AMASS, a difficulty-aware repartitioning of the AMASS dataset. Empirically, we rigorously validate MDS by demonstrating its explanatory power on the performance of state-of-the-art motion imitation policies. We further demonstrate the utility of MDS through two new MDS-based metrics: Maximum Imitable Difficulty (MID) and Difficulty-Stratified Joint Error (DSJE), providing fresh insights into imitation learning.
Problem

Research questions and friction points this paper is trying to address.

Quantifies motion imitation difficulty independent of policy performance
Introduces Motion Difficulty Score based on torque variation from pose perturbations
Provides new metrics for evaluating humanoid imitation learning policies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Motion Difficulty Score quantifies imitation difficulty independently
Metric uses torque variation from pose perturbations
Introduces difficulty-aware dataset repartitioning and new evaluation metrics
🔎 Similar Papers
No similar papers found.