Beyond Fixed Tasks: Unsupervised Environment Design for Task-Level Pairs

📅 2025-11-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In complex environments, random combinations of tasks and levels often yield unsolvable instances, hindering the training of generalist agents. Method: This paper introduces ATLAS, an automatic curriculum learning framework that jointly generates solvable and challenging task-level–environment-level pairs. ATLAS extends unsupervised environment design (UED) to the task level for the first time, explicitly modeling task structure via reward machines and employing a structure-aware mutation strategy to enable co-adaptive evolution of tasks and levels. Contribution/Results: By eliminating inefficient random sampling, ATLAS significantly improves curriculum quality and accelerates policy convergence on MiniGrid—particularly in regimes where solvable instances are sparse. It establishes a novel paradigm for efficient training of generalist agents through structured, task-aware environmental co-evolution.

Technology Category

Application Category

📝 Abstract
Training general agents to follow complex instructions (tasks) in intricate environments (levels) remains a core challenge in reinforcement learning. Random sampling of task-level pairs often produces unsolvable combinations, highlighting the need to co-design tasks and levels. While unsupervised environment design (UED) has proven effective at automatically designing level curricula, prior work has only considered a fixed task. We present ATLAS (Aligning Tasks and Levels for Autocurricula of Specifications), a novel method that generates joint autocurricula over tasks and levels. Our approach builds upon UED to automatically produce solvable yet challenging task-level pairs for policy training. To evaluate ATLAS and drive progress in the field, we introduce an evaluation suite that models tasks as reward machines in Minigrid levels. Experiments demonstrate that ATLAS vastly outperforms random sampling approaches, particularly when sampling solvable pairs is unlikely. We further show that mutations leveraging the structure of both tasks and levels accelerate convergence to performant policies.
Problem

Research questions and friction points this paper is trying to address.

Training agents to follow complex instructions in intricate environments
Random sampling often creates unsolvable task-level combinations
Prior methods only considered fixed tasks, not joint task-level design
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates joint autocurricula over tasks and levels
Builds upon unsupervised environment design methodology
Uses mutations leveraging structure of tasks and levels
🔎 Similar Papers
No similar papers found.
Daniel Furelos-Blanco
Daniel Furelos-Blanco
Imperial College London
automated planningmachine learningreinforcement learning
C
Charles Pert
Imperial College London
F
Frederik Kelbel
Imperial College London
A
Alex F. Spies
Imperial College London
A
Alessandra Russo
Imperial College London
Michael Dennis
Michael Dennis
Google DeepMind
Open-EndednessUnsupervised Environment DesignAI Safety