🤖 AI Summary
This study addresses the challenge of automating reinforcement learning (RL) algorithm design—traditionally reliant on manual hyperparameter tuning and architectural engineering—via meta-learning. We systematically compare four meta-learning paradigms: evolutionary black-box optimization, large language model (LLM)-driven code generation, meta-training/meta-testing frameworks, and modular meta-learning strategies, each targeting core RL components including policy networks, reward shaping, and exploration mechanisms. For the first time, we conduct an empirical evaluation across a unified benchmark along four dimensions: performance gain, sample efficiency, interpretability, and training cost. Our results delineate the applicability boundaries and inherent trade-offs among these approaches. Based on rigorous analysis, we distill seven practical guidelines for efficient, automated RL algorithm generation. This work provides both theoretical foundations and engineering insights for meta-learning–driven automated AI system design.
📝 Abstract
The process of meta-learning algorithms from data, instead of relying on manual design, is growing in popularity as a paradigm for improving the performance of machine learning systems. Meta-learning shows particular promise for reinforcement learning (RL), where algorithms are often adapted from supervised or unsupervised learning despite their suboptimality for RL. However, until now there has been a severe lack of comparison between different meta-learning algorithms, such as using evolution to optimise over black-box functions or LLMs to propose code. In this paper, we carry out this empirical comparison of the different approaches when applied to a range of meta-learned algorithms which target different parts of the RL pipeline. In addition to meta-train and meta-test performance, we also investigate factors including the interpretability, sample cost and train time for each meta-learning algorithm. Based on these findings, we propose several guidelines for meta-learning new RL algorithms which will help ensure that future learned algorithms are as performant as possible.