A Tutorial on Meta-Reinforcement Learning

📅 2023-01-19
🏛️ Found. Trends Mach. Learn.
📈 Citations: 126
Influential: 6
📄 PDF
🤖 AI Summary
Deep reinforcement learning (DRL) suffers from low data efficiency and poor generalization across tasks. Meta-reinforcement learning (Meta-RL) addresses this by treating algorithm design itself as a learning problem, enabling agents to rapidly adapt to new tasks with few samples drawn from a task distribution. This paper introduces the first unified classification framework for Meta-RL, characterized along two orthogonal dimensions: (i) whether the task distribution is explicitly modeled, and (ii) whether the per-task learning budget is constrained. We systematically survey problem formulations, core paradigms, and representative algorithms—including MAML-based, RNN-based, contextual, and Bayesian approaches. Our contributions are threefold: (i) clarifying the field’s evolutionary trajectory and identifying key open challenges; (ii) constructing the first structured pedagogical guide; and (iii) advancing Meta-RL toward becoming a standard tool in the DRL toolkit—now widely adopted for both education and practical onboarding.
📝 Abstract
While deep reinforcement learning (RL) has fueled multiple high-profile successes in machine learning, it is held back from more widespread adoption by its often poor data efficiency and the limited generality of the policies it produces. A promising approach for alleviating these limitations is to cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL. Meta-RL is most commonly studied in a problem setting where, given a distribution of tasks, the goal is to learn a policy that is capable of adapting to any new task from the task distribution with as little data as possible. In this survey, we describe the meta-RL problem setting in detail as well as its major variations. We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task. Using these clusters, we then survey meta-RL algorithms and applications. We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.
Problem

Research questions and friction points this paper is trying to address.

Improving data efficiency in deep reinforcement learning
Enhancing policy generality across diverse tasks
Developing meta-RL algorithms for rapid task adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses meta-RL for better data efficiency
Learns adaptable policies from task distributions
Clusters research by task distribution and budget
🔎 Similar Papers
No similar papers found.