Imitation from Observations with Trajectory-Level Generative Embeddings

📅 2026-01-01
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge in offline observation imitation learning where expert demonstrations are scarce and exhibit significant distributional divergence from suboptimal offline data. To overcome the resulting lack of learning signals due to minimal support overlap, the authors propose Trajectory-level Generative Embedding (TGE), a method that estimates expert state densities in the latent space of a temporal diffusion model to construct a dense and smooth proxy reward. By capturing long-horizon temporal dynamics through trajectory-level generative embeddings, TGE effectively mitigates the distribution mismatch issue. Unlike existing approaches that rely on one-step transition models or strict support constraints, TGE leverages heterogeneous offline data more efficiently. Experimental results demonstrate that TGE consistently matches or surpasses state-of-the-art offline observation imitation learning algorithms across multiple locomotion and manipulation tasks in the D4RL benchmark.

Technology Category

Application Category

📝 Abstract
We consider the offline imitation learning from observations (LfO) where the expert demonstrations are scarce and the available offline suboptimal data are far from the expert behavior. Many existing distribution-matching approaches struggle in this regime because they impose strict support constraints and rely on brittle one-step models, making it hard to extract useful signal from imperfect data. To tackle this challenge, we propose TGE, a trajectory-level generative embedding for offline LfO that constructs a dense, smooth surrogate reward by estimating expert state density in the latent space of a temporal diffusion model trained on offline trajectory data. By leveraging the smooth geometry of the learned diffusion embedding, TGE captures long-horizon temporal dynamics and effectively bridges the gap between disjoint supports, ensuring a robust learning signal even when offline data is distributionally distinct from the expert. Empirically, the proposed approach consistently matches or outperforms prior offline LfO methods across a range of D4RL locomotion and manipulation benchmarks.
Problem

Research questions and friction points this paper is trying to address.

imitation learning from observations
offline imitation learning
expert demonstrations
distribution mismatch
suboptimal data
Innovation

Methods, ideas, or system contributions that make the work stand out.

trajectory-level generative embedding
offline imitation learning from observations
temporal diffusion model
surrogate reward
distribution mismatch
🔎 Similar Papers
No similar papers found.
Y
Yongtao Qu
University of North Carolina at Chapel Hill, Chapel Hill, NC, 27514
S
Shangzhe Li
University of North Carolina at Chapel Hill, Chapel Hill, NC, 27514
Weitong Zhang
Weitong Zhang
Assistant Professor, SDSS, UNC Chapel Hill
Reinforcement LearningOptimizationAI4Science