🤖 AI Summary
This work addresses the challenge that traditional interpolation-based curriculum reinforcement learning struggles to effectively measure task similarity in complex navigation tasks due to the non-Euclidean nature of the task space, leading to failure in automatic curriculum generation. To overcome this limitation, the authors propose a metric-aware task representation learning approach that leverages a variational autoencoder to encode both reward functions and state transitions, thereby mapping original tasks into an implicit embedding space endowed with well-defined similarity metrics. Based on this learned representation, an automatic curriculum generation mechanism is devised that does not rely on assumptions about the underlying task space structure. Empirical results demonstrate that the proposed method significantly outperforms existing curriculum reinforcement learning approaches—particularly those based on interpolation or generative adversarial networks—across a range of challenging navigation tasks.
📝 Abstract
In curriculum reinforcement learning (CRL), an agent incrementally accumulates knowledge over a sequence of tasks (i.e., a curriculum), and the learning process is aimed at using the accumulated knowledge to finally solve a challenging target task. While early CRL works focus on sequencing candidate tasks, recent research explores automatic curriculum generation. Among the rich CRL literature, the interpolation-based CRL paradigm is a main body, which automatically generates intermediate tasks by interpolating between the initial task distribution and the target task distribution in task space with meaningful distance metrics (i.e., can measure the task similarity). However, in challenging navigation tasks, the non-Euclidean context (task) space invalidates this assumption. To achieve automatic curriculum generation in complex task, we propose a novel automatic curriculum generation approach based on measurable task representation learning. To better measure the similarity, we propose to transform the task space to a latent space. Through a variational autoencoder structure that encodes the reward and the state transitions, we achieve a latent task representation with a task similarity measurement property, and two close task embeddings correspond to two similar tasks in terms of rewards and state transitions. Based on the learned task representation, we further develop an automatic curriculum generation scheme, which can effectively generate new tasks more and more similar to the target task. We evaluate our method in a variety of challenging navigation tasks, and the experiment results indicate that the proposed approach surpasses state-of-the-art CRL approaches based on interpolation and generative adversarial networks.