🤖 AI Summary
Existing meta-learning methods suffer from limited generalizability, often being confined to specific algorithms or requiring differentiability assumptions. This paper proposes a general reinforcement learning–driven meta-learning framework that trains a teacher policy to dynamically guide arbitrary student algorithms—without imposing structural or differentiability constraints on the student. Key contributions include: (i) the first unified pedagogical paradigm for meta-learning; (ii) a parameter-behavior encoder that implicitly infers the student’s internal parameter state from its input-output behavior; and (iii) a reward function grounded in learning progress. Experiments across supervised and reinforcement learning tasks demonstrate that our framework significantly outperforms baselines relying on heuristic rewards and handcrafted state representations, validating its broad generalizability and empirical effectiveness.
📝 Abstract
Machine learning algorithms learn to solve a task, but are unable to improve their ability to learn. Meta-learning methods learn about machine learning algorithms and improve them so that they learn more quickly. However, existing meta-learning methods are either hand-crafted to improve one specific component of an algorithm or only work with differentiable algorithms. We develop a unifying meta-learning framework, called Reinforcement Teaching, to improve the learning process of emph{any} algorithm. Under Reinforcement Teaching, a teaching policy is learned, through reinforcement, to improve a student's learning algorithm. To learn an effective teaching policy, we introduce the parametric-behavior embedder that learns a representation of the student's learnable parameters from its input/output behavior. We further use learning progress to shape the teacher's reward, allowing it to more quickly maximize the student's performance. To demonstrate the generality of Reinforcement Teaching, we conduct experiments in which a teacher learns to significantly improve both reinforcement and supervised learning algorithms. Reinforcement Teaching outperforms previous work using heuristic reward functions and state representations, as well as other parameter representations.