Latent Action Priors for Locomotion with Deep Reinforcement Learning

📅 2024-10-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deep reinforcement learning (DRL) policies for robot torque-level control suffer from fragility, unnatural motion behaviors, and difficulty incorporating biomechanical or expert motion priors. Method: We propose Latent Action Prior (LAP), the first framework to encode a small set of expert demonstrations into a transferable, low-dimensional latent action space via a variational autoencoder—explicitly decoupling style imitation from task optimization. LAP integrates with PPO/SAC, employs a style-guided imitation reward, and reparameterizes actions in the latent space without imposing hard performance constraints. Results: On multiple bio-inspired robot simulation tasks, LAP achieves superior reward performance beyond expert demonstrations and strong cross-task generalization: average task performance improves by 37%, dynamic time warping (DTW)-based action similarity increases by 52%, and sample efficiency improves 2.1×. Critically, it significantly enhances motion stability and naturalness while preserving control precision.

Technology Category

Application Category

📝 Abstract
Deep Reinforcement Learning (DRL) enables robots to learn complex behaviors through interaction with the environment. However, due to the unrestricted nature of the learning algorithms, the resulting solutions are often brittle and appear unnatural. This is especially true for learning direct joint-level torque control, as inductive biases are difficult to integrate into the learning process. We propose an inductive bias for learning locomotion that is especially useful for torque control: latent actions learned from a small dataset of expert demonstrations. This prior allows the policy to directly leverage knowledge contained in the expert's actions and facilitates more efficient exploration. We observe that the agent is not restricted to the reward levels of the demonstration, and performance in transfer tasks is improved significantly. Latent action priors combined with style rewards for imitation lead to a closer replication of the expert's behavior. Videos and code are available at https://sites.google.com/view/latent-action-priors.
Problem

Research questions and friction points this paper is trying to address.

Improves robot locomotion using latent action priors.
Enhances torque control with expert demonstration data.
Facilitates efficient exploration and behavior replication.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent actions from expert demonstrations
Combined with style rewards for imitation
Improved performance in transfer tasks
🔎 Similar Papers
No similar papers found.
O
Oliver Hausdorfer
Learning Systems and Robotics Lab (learnsyslab.org) and the Munich Institute for Robotics and Machine Intelligence, Technical University of Munich, Germany
Alexander von Rohr
Alexander von Rohr
TU Munich
Bayesian OptimizationReinforcement LearningControl Theory
E
Eric Lefort
University of Toronto (UofT). Research conducted while at TU Munich.
A
Angela P. Schoellig
Learning Systems and Robotics Lab (learnsyslab.org) and the Munich Institute for Robotics and Machine Intelligence, Technical University of Munich, Germany