🤖 AI Summary
Offline meta-reinforcement learning struggles to learn effective representations of task dynamics due to the absence of supervisory signals, which limits policy generalization to unseen tasks. This work proposes a Contextual Latent World Model that, for the first time, introduces a task-conditional temporal consistency constraint into this setting. By jointly training a context encoder with a latent world model, the approach enables task representations not only to distinguish between tasks but also to capture their underlying dynamical differences. Empirical results demonstrate that this method significantly improves cross-task generalization performance of policies across multiple benchmarks, including MuJoCo, Contextual DeepMind Control Suite, and Meta-World.
📝 Abstract
Offline meta-reinforcement learning seeks to learn policies that generalize across related tasks from fixed datasets. Context-based methods infer a task representation from transition histories, but learning effective task representations without supervision remains a challenge. In parallel, latent world models have demonstrated strong self-supervised representation learning through temporal consistency. We introduce contextual latent world models, which condition latent world models on inferred task representations and train them jointly with the context encoder. This enforces task-conditioned temporal consistency, yielding task representations that capture task-dependent dynamics rather than merely discriminating between tasks. Our method learns more expressive task representations and significantly improves generalization to unseen tasks across MuJoCo, Contextual-DeepMind Control, and Meta-World benchmarks.