Offline Imitation Learning upon Arbitrary Demonstrations by Pre-Training Dynamics Representations

๐Ÿ“… 2025-08-19
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address performance limitations in offline imitation learning caused by scarce expert demonstrations, this paper proposes a dynamics-aware representation pretraining framework. Methodologically, it (1) introduces a factorized representation of state-transition dynamics, proving that the optimal policy is fully characterized within a low-dimensional latent spaceโ€”enabling parameter-efficient learning, cross-domain transfer, and single-trajectory imitation; (2) designs an unsupervised contrastive loss based on noise-contrastive estimation, leveraging arbitrary non-expert and simulated data for representation pretraining; and (3) achieves high-fidelity imitation on MuJoCo benchmarks using only a single expert trajectory. Furthermore, on a real quadrupedal robot, the framework successfully learns robust locomotion policies via simulation-based pretraining followed by minimal real-world demonstrations. Empirical results demonstrate significant improvements in sample efficiency, generalization, and real-world deployability compared to prior offline imitation learning approaches.

Technology Category

Application Category

๐Ÿ“ Abstract
Limited data has become a major bottleneck in scaling up offline imitation learning (IL). In this paper, we propose enhancing IL performance under limited expert data by introducing a pre-training stage that learns dynamics representations, derived from factorizations of the transition dynamics. We first theoretically justify that the optimal decision variable of offline IL lies in the representation space, significantly reducing the parameters to learn in the downstream IL. Moreover, the dynamics representations can be learned from arbitrary data collected with the same dynamics, allowing the reuse of massive non-expert data and mitigating the limited data issues. We present a tractable loss function inspired by noise contrastive estimation to learn the dynamics representations at the pre-training stage. Experiments on MuJoCo demonstrate that our proposed algorithm can mimic expert policies with as few as a single trajectory. Experiments on real quadrupeds show that we can leverage pre-trained dynamics representations from simulator data to learn to walk from a few real-world demonstrations.
Problem

Research questions and friction points this paper is trying to address.

Enhancing offline imitation learning with limited expert data
Learning dynamics representations from arbitrary non-expert data
Reducing parameters needed for downstream imitation learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pre-training dynamics representations from arbitrary data
Using noise contrastive estimation for representation learning
Transferring simulator representations to real-world applications
๐Ÿ”Ž Similar Papers
No similar papers found.
Haitong Ma
Haitong Ma
Graduate student, Harvard University
Reinforcement LearningRoboticsControl Theory
B
Bo Dai
School of Computational Science and Engineering, Georgia Institute of Technology
Zhaolin Ren
Zhaolin Ren
Graduate Student, Harvard University
Control and OptimizationReinforcement Learning
Yebin Wang
Yebin Wang
Mitsubishi Electric Research Laborotaries
control theoryreinforcement learningmechatronicsbatteries
N
Na Li
School of Engineering and Applied Sciences, Harvard University; Mitsubishi Electric Research Laboratories (MERL)