Efficient Unsupervised Environment Design through Hierarchical Policy Representation Learning

📅 2026-02-10

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This work addresses the challenge of inefficient environment generation in unsupervised curriculum design under resource-constrained settings, where existing methods rely heavily on extensive teacher–student interactions to produce training environments tailored to student capabilities. To overcome this limitation, the authors propose a novel framework that integrates student policy representations with a hierarchical Markov decision process (HMDP), enabling a teacher agent to dynamically generate targeted training environments based on the evolving student policy. Additionally, a generative model is introduced to synthesize data, thereby reducing reliance on real-world interactions. This approach represents the first integration of policy representation learning with HMDP for environment design and achieves substantially improved sample efficiency. Empirical results across multiple tasks demonstrate that the method surpasses current baselines using significantly fewer teacher–student interactions, confirming its effectiveness and advantage in scenarios with limited training opportunities.

Technology Category

Application Category

📝 Abstract

Unsupervised Environment Design (UED) has emerged as a promising approach to developing general-purpose agents through automated curriculum generation. Popular UED methods focus on Open-Endedness, where teacher algorithms rely on stochastic processes for infinite generation of useful environments. This assumption becomes impractical in resource-constrained scenarios where teacher-student interaction opportunities are limited. To address this challenge, we introduce a hierarchical Markov Decision Process (MDP) framework for environment design. Our framework features a teacher agent that leverages student policy representations derived from discovered evaluation environments, enabling it to generate training environments based on the student's capabilities. To improve efficiency, we incorporate a generative model that augments the teacher's training dataset with synthetic data, reducing the need for teacher-student interactions. In experiments across several domains, we show that our method outperforms baseline approaches while requiring fewer teacher-student interactions in a single episode. The results suggest the applicability of our approach in settings where training opportunities are limited.

Problem

Research questions and friction points this paper is trying to address.

Unsupervised Environment Design

Resource-constrained Scenarios

Teacher-Student Interaction

Curriculum Generation

General-purpose Agents

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised Environment Design

Hierarchical MDP

Policy Representation Learning