Offline Imitation Learning upon Arbitrary Demonstrations by Pre-Training Dynamics Representations

📅 2025-08-19

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

To address performance limitations in offline imitation learning caused by scarce expert demonstrations, this paper proposes a dynamics-aware representation pretraining framework. Methodologically, it (1) introduces a factorized representation of state-transition dynamics, proving that the optimal policy is fully characterized within a low-dimensional latent space—enabling parameter-efficient learning, cross-domain transfer, and single-trajectory imitation; (2) designs an unsupervised contrastive loss based on noise-contrastive estimation, leveraging arbitrary non-expert and simulated data for representation pretraining; and (3) achieves high-fidelity imitation on MuJoCo benchmarks using only a single expert trajectory. Furthermore, on a real quadrupedal robot, the framework successfully learns robust locomotion policies via simulation-based pretraining followed by minimal real-world demonstrations. Empirical results demonstrate significant improvements in sample efficiency, generalization, and real-world deployability compared to prior offline imitation learning approaches.

Technology Category

Application Category

📝 Abstract

Limited data has become a major bottleneck in scaling up offline imitation learning (IL). In this paper, we propose enhancing IL performance under limited expert data by introducing a pre-training stage that learns dynamics representations, derived from factorizations of the transition dynamics. We first theoretically justify that the optimal decision variable of offline IL lies in the representation space, significantly reducing the parameters to learn in the downstream IL. Moreover, the dynamics representations can be learned from arbitrary data collected with the same dynamics, allowing the reuse of massive non-expert data and mitigating the limited data issues. We present a tractable loss function inspired by noise contrastive estimation to learn the dynamics representations at the pre-training stage. Experiments on MuJoCo demonstrate that our proposed algorithm can mimic expert policies with as few as a single trajectory. Experiments on real quadrupeds show that we can leverage pre-trained dynamics representations from simulator data to learn to walk from a few real-world demonstrations.

Problem

Research questions and friction points this paper is trying to address.

Enhancing offline imitation learning with limited expert data

Learning dynamics representations from arbitrary non-expert data

Reducing parameters needed for downstream imitation learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pre-training dynamics representations from arbitrary data

Using noise contrastive estimation for representation learning

Transferring simulator representations to real-world applications

🔎 Similar Papers

No similar papers found.