🤖 AI Summary
Style imitation and task performance are often conflicting objectives in robot stylistic motion learning. Method: This paper proposes a constrained reinforcement learning framework that formulates the problem as a Markov decision process with task-performance constraints. Its core innovation is an adaptive Lagrange multiplier mechanism that dynamically balances stylistic imitation against task optimality, enabling selective extraction and retention of stylistic features from imperfect demonstrations. Contribution/Results: Evaluated in simulation and on the ANYmal-D quadruped platform, the method maintains strict task performance while reducing mechanical energy consumption by 14.5% and significantly improving gait agility—without compromising style fidelity or behavioral robustness. To our knowledge, this is the first work to systematically address style–task co-optimization under non-ideal demonstrations, establishing a scalable paradigm for high-fidelity, task-aware behavior synthesis in embodied agents.
📝 Abstract
Learning from demonstration has proven effective in robotics for acquiring natural behaviors, such as stylistic motions and lifelike agility, particularly when explicitly defining style-oriented reward functions is challenging. Synthesizing stylistic motions for real-world tasks usually requires balancing task performance and imitation quality. Existing methods generally depend on expert demonstrations closely aligned with task objectives. However, practical demonstrations are often incomplete or unrealistic, causing current methods to boost style at the expense of task performance. To address this issue, we propose formulating the problem as a constrained Markov Decision Process (CMDP). Specifically, we optimize a style-imitation objective with constraints to maintain near-optimal task performance. We introduce an adaptively adjustable Lagrangian multiplier to guide the agent to imitate demonstrations selectively, capturing stylistic nuances without compromising task performance. We validate our approach across multiple robotic platforms and tasks, demonstrating both robust task performance and high-fidelity style learning. On ANYmal-D hardware we show a 14.5% drop in mechanical energy and a more agile gait pattern, showcasing real-world benefits.