Task Aware Dreamer for Task Generalization in Reinforcement Learning

📅 2023-03-09

📈 Citations: 2

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This work addresses cross-task generalization in reinforcement learning, where agents must rapidly adapt to unseen tasks that are dynamically similar yet differ in reward functions. To this end, we propose Task-Aware Dreamer (TAD), a novel framework introducing (i) a reward-aware world model and (ii) a task-discriminative variational objective, alongside the Task Distribution Relevance (TDR) metric to quantify inter-task divergence. TAD integrates a variational inference-based world model, reward-conditioned latent representation learning, Dreamer-style model-based prediction and policy optimization, and a TDR-driven policy selection mechanism. Experiments across image- and state-space multi-task benchmarks demonstrate substantial improvements in parallel training efficiency and zero-shot generalization performance. Notably, TAD significantly outperforms conventional Markovian policies—especially under high-TDR conditions—highlighting its efficacy in handling reward-divergent task distributions.

📝 Abstract

A long-standing goal of reinforcement learning is to acquire agents that can learn on training tasks and generalize well on unseen tasks that may share a similar dynamic but with different reward functions. The ability to generalize across tasks is important as it determines an agent's adaptability to real-world scenarios where reward mechanisms might vary. In this work, we first show that training a general world model can utilize similar structures in these tasks and help train more generalizable agents. Extending world models into the task generalization setting, we introduce a novel method named Task Aware Dreamer (TAD), which integrates reward-informed features to identify consistent latent characteristics across tasks. Within TAD, we compute the variational lower bound of sample data log-likelihood, which introduces a new term designed to differentiate tasks using their states, as the optimization objective of our reward-informed world models. To demonstrate the advantages of the reward-informed policy in TAD, we introduce a new metric called Task Distribution Relevance (TDR) which quantitatively measures the relevance of different tasks. For tasks exhibiting a high TDR, i.e., the tasks differ significantly, we illustrate that Markovian policies struggle to distinguish them, thus it is necessary to utilize reward-informed policies in TAD. Extensive experiments in both image-based and state-based tasks show that TAD can significantly improve the performance of handling different tasks simultaneously, especially for those with high TDR, and display a strong generalization ability to unseen tasks.

Problem

Research questions and friction points this paper is trying to address.

Improve task generalization in reinforcement learning

Integrate reward-informed features across tasks

Enhance adaptability to unseen task variations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates reward-informed features

Computes variational lower bound

Introduces Task Distribution Relevance

🔎 Similar Papers

No similar papers found.