CLAM: Continuous Latent Action Models for Robot Learning from Unlabeled Demonstrations

πŸ“… 2025-05-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the scarcity and high cost of expert action labels in imitation learning, this paper proposes a continuous latent action representation learning framework that requires no action annotations. Methodologically, it employs a variational autoencoder to perform unsupervised latent space modeling over unlabeled demonstration videos and introduces an end-to-end jointly optimized action decoder to automatically infer latent action sequences. Furthermore, a policy distillation mechanism is incorporated to reliably ground the learned latent space to physical actionsβ€”even with only a few (or zero) real action labels. Experiments on DMControl, MetaWorld, and a real-world WidowX robotic arm demonstrate that our approach significantly outperforms state-of-the-art methods, achieving 2–3Γ— higher task success rates. These results validate its strong generalization capability and practical efficacy under zero-label conditions.

Technology Category

Application Category

πŸ“ Abstract
Learning robot policies using imitation learning requires collecting large amounts of costly action-labeled expert demonstrations, which fundamentally limits the scale of training data. A promising approach to address this bottleneck is to harness the abundance of unlabeled observations-e.g., from video demonstrations-to learn latent action labels in an unsupervised way. However, we find that existing methods struggle when applied to complex robot tasks requiring fine-grained motions. We design continuous latent action models (CLAM) which incorporate two key ingredients we find necessary for learning to solve complex continuous control tasks from unlabeled observation data: (a) using continuous latent action labels instead of discrete representations, and (b) jointly training an action decoder to ensure that the latent action space can be easily grounded to real actions with relatively few labeled examples. Importantly, the labeled examples can be collected from non-optimal play data, enabling CLAM to learn performant policies without access to any action-labeled expert data. We demonstrate on continuous control benchmarks in DMControl (locomotion) and MetaWorld (manipulation), as well as on a real WidowX robot arm that CLAM significantly outperforms prior state-of-the-art methods, remarkably with a 2-3x improvement in task success rate compared to the best baseline. Videos and code can be found at clamrobot.github.io.
Problem

Research questions and friction points this paper is trying to address.

Learning robot policies without costly labeled demonstrations
Improving performance on complex tasks with fine-grained motions
Enabling effective policy learning from unlabeled observation data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Continuous latent action labels for fine-grained motions
Joint training with action decoder for grounding
Learning from non-optimal play data without expert labels
πŸ”Ž Similar Papers
No similar papers found.
Anthony Liang
Anthony Liang
University of Southern California
Robot LearningReinforcement Learning
P
Pavel Czempin
Department of Computer Science, University of Southern California
Matthew Hong
Matthew Hong
University of Southern California
Robot LearningReinforcement Learning
Y
Yutai Zhou
Department of Computer Science, University of Southern California
E
Erdem Biyik
Department of Computer Science, University of Southern California, Department of Electrical and Computer Engineering, University of Southern California
Stephen Tu
Stephen Tu
University of Southern California