TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design

๐Ÿ“… 2025-06-24
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Deep reinforcement learning agents often exhibit poor zero-shot generalization to unseen environments. Method: We propose an adaptive curriculum generation framework that jointly evolves student policies and teacher tasks. For the first time, we explicitly model inter-task learning influence relationships, jointly optimizing state-transition prediction error and co-learnability metrics to approximate regretโ€”thereby guiding curriculum complexity adaptation. Additionally, we introduce an unsupervised environment design mechanism that dynamically assesses task learnability. Contribution/Results: Our method achieves significant improvements in zero-shot generalization across multiple benchmarks, reducing environmental interaction counts by up to 2ร— compared to baselines. Ablation studies confirm the effectiveness and necessity of each component, including the influence modeling, regret approximation, and unsupervised environment design.

Technology Category

Application Category

๐Ÿ“ Abstract
Generalizing deep reinforcement learning agents to unseen environments remains a significant challenge. One promising solution is Unsupervised Environment Design (UED), a co-evolutionary framework in which a teacher adaptively generates tasks with high learning potential, while a student learns a robust policy from this evolving curriculum. Existing UED methods typically measure learning potential via regret, the gap between optimal and current performance, approximated solely by value-function loss. Building on these approaches, we introduce the transition prediction error as an additional term in our regret approximation. To capture how training on one task affects performance on others, we further propose a lightweight metric called co-learnability. By combining these two measures, we present Transition-aware Regret Approximation with Co-learnability for Environment Design (TRACED). Empirical evaluations show that TRACED yields curricula that improve zero-shot generalization across multiple benchmarks while requiring up to 2x fewer environment interactions than strong baselines. Ablation studies confirm that the transition prediction error drives rapid complexity ramp-up and that co-learnability delivers additional gains when paired with the transition prediction error. These results demonstrate how refined regret approximation and explicit modeling of task relationships can be leveraged for sample-efficient curriculum design in UED.
Problem

Research questions and friction points this paper is trying to address.

Improving generalization of RL agents to unseen environments
Enhancing regret approximation with transition prediction error
Measuring task co-learnability for efficient curriculum design
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces transition prediction error in regret approximation
Proposes lightweight co-learnability metric for task relationships
Combines both measures for efficient curriculum design
๐Ÿ”Ž Similar Papers
No similar papers found.
Geonwoo Cho
Geonwoo Cho
Gwangju Institute of Science and Technology
Reinforcement Learning
J
Jaegyun Im
Gwangju Institute of Science and Technology
J
Jihwan Lee
Gwangju Institute of Science and Technology
H
Hojun Yi
Gwangju Institute of Science and Technology
S
Sejin Kim
Gwangju Institute of Science and Technology
Sundong Kim
Sundong Kim
Assistant Professor, GIST
AGIArtificial IntelligenceMachine LearningData Mining