🤖 AI Summary
Automatic curriculum generation in reinforcement learning faces challenges including slow convergence, poor generalization across tasks, and limited interpretability. To address these, we propose the Skill-Environment Bayesian Network (SEBN), the first probabilistic graphical model that jointly represents skills, reward objectives, and environmental features—enabling interpretable curriculum design, cross-task generalization, and principled uncertainty quantification. Building upon SEBN, we develop an expectation-based task selection algorithm that dynamically predicts policy performance on unseen tasks and selects the optimal training sequence to maximize expected performance gain. We evaluate our method across three distinct domains: discrete grid worlds, continuous control benchmarks, and simulated robotic manipulation. Compared to hand-crafted curricula and state-of-the-art baselines, our approach improves training efficiency by 32% on average and enhances final policy performance by 18–27%.
📝 Abstract
A major challenge for reinforcement learning is automatically generating curricula to reduce training time or improve performance in some target task. We introduce SEBNs (Skill-Environment Bayesian Networks) which model a probabilistic relationship between a set of skills, a set of goals that relate to the reward structure, and a set of environment features to predict policy performance on (possibly unseen) tasks. We develop an algorithm that uses the inferred estimates of agent success from SEBN to weigh the possible next tasks by expected improvement. We evaluate the benefit of the resulting curriculum on three environments: a discrete gridworld, continuous control, and simulated robotics. The results show that curricula constructed using SEBN frequently outperform other baselines.