Autonomous Curriculum Design via Relative Entropy Based Task Modifications

📅 2025-02-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses autonomous curriculum learning by proposing a fully automated, human-prior-free curriculum generation method. Methodologically, it introduces relative entropy as a novel measure of policy uncertainty to guide agents toward exploring high-uncertainty states, thereby accelerating convergence on target tasks. The approach employs a two-timescale stochastic optimization framework, providing theoretical convergence guarantees and supporting both self-assessment–driven autonomous design and teacher-guided modes. A heuristic metric—integrating KL divergence with state-transition distance—enables dynamic selection of curriculum tasks. Empirical evaluation across multiple reinforcement learning benchmarks demonstrates that the method significantly outperforms random curricula, direct training, and state-of-the-art curriculum learning algorithms, validating its effectiveness, generalizability, and practical utility.

Technology Category

Application Category

📝 Abstract
Curriculum learning is a training method in which an agent is first trained on a curriculum of relatively simple tasks related to a target task in an effort to shorten the time required to train on the target task. Autonomous curriculum design involves the design of such curriculum with no reliance on human knowledge and/or expertise. Finding an efficient and effective way of autonomously designing curricula remains an open problem. We propose a novel approach for automatically designing curricula by leveraging the learner's uncertainty to select curricula tasks. Our approach measures the uncertainty in the learner's policy using relative entropy, and guides the agent to states of high uncertainty to facilitate learning. Our algorithm supports the generation of autonomous curricula in a self-assessed manner by leveraging the learner's past and current policies but it also allows the use of teacher guided design in an instructive setting. We provide theoretical guarantees for the convergence of our algorithm using two time-scale optimization processes. Results show that our algorithm outperforms randomly generated curriculum, and learning directly on the target task as well as the curriculum-learning criteria existing in literature. We also present two additional heuristic distance measures that could be combined with our relative-entropy approach for further performance improvements.
Problem

Research questions and friction points this paper is trying to address.

Autonomous curriculum design without human expertise.
Using relative entropy to measure learner uncertainty.
Improving training efficiency on target tasks.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Autonomous curriculum design via relative entropy
Leverages learner's uncertainty for task selection
Combines self-assessed and teacher-guided curriculum generation
🔎 Similar Papers
No similar papers found.