Task Aware Dreamer for Task Generalization in Reinforcement Learning

📅 2023-03-09
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses cross-task generalization in reinforcement learning, where agents must rapidly adapt to unseen tasks that are dynamically similar yet differ in reward functions. To this end, we propose Task-Aware Dreamer (TAD), a novel framework introducing (i) a reward-aware world model and (ii) a task-discriminative variational objective, alongside the Task Distribution Relevance (TDR) metric to quantify inter-task divergence. TAD integrates a variational inference-based world model, reward-conditioned latent representation learning, Dreamer-style model-based prediction and policy optimization, and a TDR-driven policy selection mechanism. Experiments across image- and state-space multi-task benchmarks demonstrate substantial improvements in parallel training efficiency and zero-shot generalization performance. Notably, TAD significantly outperforms conventional Markovian policies—especially under high-TDR conditions—highlighting its efficacy in handling reward-divergent task distributions.
📝 Abstract
A long-standing goal of reinforcement learning is to acquire agents that can learn on training tasks and generalize well on unseen tasks that may share a similar dynamic but with different reward functions. The ability to generalize across tasks is important as it determines an agent's adaptability to real-world scenarios where reward mechanisms might vary. In this work, we first show that training a general world model can utilize similar structures in these tasks and help train more generalizable agents. Extending world models into the task generalization setting, we introduce a novel method named Task Aware Dreamer (TAD), which integrates reward-informed features to identify consistent latent characteristics across tasks. Within TAD, we compute the variational lower bound of sample data log-likelihood, which introduces a new term designed to differentiate tasks using their states, as the optimization objective of our reward-informed world models. To demonstrate the advantages of the reward-informed policy in TAD, we introduce a new metric called Task Distribution Relevance (TDR) which quantitatively measures the relevance of different tasks. For tasks exhibiting a high TDR, i.e., the tasks differ significantly, we illustrate that Markovian policies struggle to distinguish them, thus it is necessary to utilize reward-informed policies in TAD. Extensive experiments in both image-based and state-based tasks show that TAD can significantly improve the performance of handling different tasks simultaneously, especially for those with high TDR, and display a strong generalization ability to unseen tasks.
Problem

Research questions and friction points this paper is trying to address.

Improve task generalization in reinforcement learning
Integrate reward-informed features across tasks
Enhance adaptability to unseen task variations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates reward-informed features
Computes variational lower bound
Introduces Task Distribution Relevance
🔎 Similar Papers
No similar papers found.
Chengyang Ying
Chengyang Ying
Tsinghua university
Machine LearningReinforcement LearningEmbodied AI
Zhongkai Hao
Zhongkai Hao
Tsinghua University
machine learningAI for Sciencephysics-informed machine learning
X
Xinning Zhou
Department of Computer Science & Technology, Institute for AI, BNRist Center, Tsinghua-Bosch Joint ML Center, THBI Lab, Tsinghua University
H
Hang Su
Department of Computer Science & Technology, Institute for AI, BNRist Center, Tsinghua-Bosch Joint ML Center, THBI Lab, Tsinghua University
Songming Liu
Songming Liu
PhD of Computer Science, Tsinghua University
AImaching learningroboticsphysics
Dong Yan
Dong Yan
AI Chief Expert, Bosch.
Reinforcement LearningFoundation Model
J
Jun Zhu
Department of Computer Science & Technology, Institute for AI, BNRist Center, Tsinghua-Bosch Joint ML Center, THBI Lab, Tsinghua University