๐ค AI Summary
To address performance limitations in offline imitation learning caused by scarce expert demonstrations, this paper proposes a dynamics-aware representation pretraining framework. Methodologically, it (1) introduces a factorized representation of state-transition dynamics, proving that the optimal policy is fully characterized within a low-dimensional latent spaceโenabling parameter-efficient learning, cross-domain transfer, and single-trajectory imitation; (2) designs an unsupervised contrastive loss based on noise-contrastive estimation, leveraging arbitrary non-expert and simulated data for representation pretraining; and (3) achieves high-fidelity imitation on MuJoCo benchmarks using only a single expert trajectory. Furthermore, on a real quadrupedal robot, the framework successfully learns robust locomotion policies via simulation-based pretraining followed by minimal real-world demonstrations. Empirical results demonstrate significant improvements in sample efficiency, generalization, and real-world deployability compared to prior offline imitation learning approaches.
๐ Abstract
Limited data has become a major bottleneck in scaling up offline imitation learning (IL). In this paper, we propose enhancing IL performance under limited expert data by introducing a pre-training stage that learns dynamics representations, derived from factorizations of the transition dynamics. We first theoretically justify that the optimal decision variable of offline IL lies in the representation space, significantly reducing the parameters to learn in the downstream IL. Moreover, the dynamics representations can be learned from arbitrary data collected with the same dynamics, allowing the reuse of massive non-expert data and mitigating the limited data issues. We present a tractable loss function inspired by noise contrastive estimation to learn the dynamics representations at the pre-training stage. Experiments on MuJoCo demonstrate that our proposed algorithm can mimic expert policies with as few as a single trajectory. Experiments on real quadrupeds show that we can leverage pre-trained dynamics representations from simulator data to learn to walk from a few real-world demonstrations.