Generalizing from References using a Multi-Task Reference and Goal-Driven RL Framework

📅 2026-02-23

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This work addresses the limited generalization of existing methods in reference trajectory tracking and the tendency of purely task-driven reinforcement learning to compromise motion naturalness. The authors propose a unified multi-task reinforcement learning framework that treats reference motions as behavioral priors rather than runtime constraints. By jointly training reference-guided imitation and goal-driven generalization tasks—sharing observation and action spaces but employing distinct initializations, instructions, and reward designs—the approach enables agents to learn structured skills from reference motions without relying on adversarial training, explicit trajectory tracking, phase variables, or runtime reference inputs. Experiments demonstrate that the resulting policy generates natural, diverse human-like locomotion in complex parkour environments and supports long-horizon execution of multi-skill sequences across novel goals and initial states.

Technology Category

Application Category

📝 Abstract

Learning agile humanoid behaviors from human motion offers a powerful route to natural, coordinated control, but existing approaches face a persistent trade-off: reference-tracking policies are often brittle outside the demonstration dataset, while purely task-driven Reinforcement Learning (RL) can achieve adaptability at the cost of motion quality. We introduce a unified multi-task RL framework that bridges this gap by treating reference motion as a prior for behavioral shaping rather than a deployment-time constraint. A single goal-conditioned policy is trained jointly on two tasks that share the same observation and action spaces, but differ in their initialization schemes, command spaces, and reward structures: (i) a reference-guided imitation task in which reference trajectories define dense imitation rewards but are not provided as policy inputs, and (ii) a goal-conditioned generalization task in which goals are sampled independently of any reference and where rewards reflect only task success. By co-optimizing these objectives within a shared formulation, the policy acquires structured, human-like motor skills from dense reference supervision while learning to adapt these skills to novel goals and initial conditions. This is achieved without adversarial objectives, explicit trajectory tracking, phase variables, or reference-dependent inference. We evaluate the method on a challenging box-based parkour playground that demands diverse athletic behaviors (e.g., jumping and climbing), and show that the learned controller transfers beyond the reference distribution while preserving motion naturalness. Finally, we demonstrate long-horizon behavior generation by composing multiple learned skills, illustrating the flexibility of the learned polices in complex scenarios.

Problem

Research questions and friction points this paper is trying to address.

humanoid locomotion

motion imitation

reinforcement learning

generalization

reference motion

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-task reinforcement learning

reference-guided imitation

goal-conditioned policy