Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling

📅 2026-05-14
📈 Citations: 0
Influential: 0
📄 PDF

career value

245K/year
🤖 AI Summary
In multitask reinforcement learning, uniform task sampling often leads agents to overfit on easy tasks while underlearning on difficult ones. This work formulates this issue as a feasibility problem and introduces a minimax optimization objective that seeks to minimize the worst-case return gap across tasks. Building upon this formulation, the authors propose an adaptive task sampling strategy grounded in distributionally robust optimization, which dynamically prioritizes tasks furthest from being solved. Notably, this approach is the first to address the challenge from the perspective of data distribution, eschewing reliance on gradient-based manipulations or specialized architectures. Empirical results on the MetaWorld MT10 and MT50 benchmarks demonstrate substantial improvements in both sample efficiency and worst-task performance, outperforming existing task sampling methods.
📝 Abstract
Multi-task reinforcement learning (MTRL) aims to train a single agent to efficiently optimize performance across multiple tasks simultaneously. However, jointly optimizing all tasks often yields imbalanced learning: agents quickly solve easy tasks but learn slowly on harder ones. While prior work primarily attributes this imbalance to conflicting task gradients and proposes gradient manipulation or specialized architectures to address it, we instead focus on a distinct and under-explored challenge: imbalanced data allocation. Standard MTRL allocates an equal number of environment interactions to each task, which over-allocates data to easy tasks that require relatively few interactions to solve and under-allocates data to hard tasks that require substantially more experience to solve. To address this challenge, we introduce Distributionally Robust Adaptive Task Sampling (DRATS), an algorithm that adaptively prioritizes sampling tasks furthest from being solved. We derive DRATS by formalizing MTRL as a feasibility problem from which we derive a minimax objective for minimizing the worst-case return gap, the difference between a desired target return and the agent's return on a task. In benchmarks like MetaWorld-MT10 and MT50, DRATS improves data efficiency and increases worst-task performance compared to existing task sampling algorithms.
Problem

Research questions and friction points this paper is trying to address.

Multi-Task Reinforcement Learning
Imbalanced Data Allocation
Task Sampling
Distributionally Robust Optimization
Worst-Case Performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive task sampling
distributionally robust optimization
multi-task reinforcement learning
data allocation imbalance
minimax objective