Can Learned Optimization Make Reinforcement Learning Less Difficult?

📅 2024-07-09

🏛️ Neural Information Processing Systems

📈 Citations: 3

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Reinforcement learning (RL) faces three key challenges: non-stationarity, plasticity decay, and insufficient exploration. To address these, we propose OPEN—a meta-learned neural optimizer that unifies plasticity modeling, explicit exploration, and non-stationarity adaptation within a single optimizer architecture. OPEN employs stochastic policy parameterization to enable intrinsic exploration and leverages multi-environment joint meta-training to enhance generalization. Trained on either a single environment or a small set of environments, OPEN matches or surpasses Adam in performance. Crucially, it demonstrates strong zero-shot transfer capability to unseen environments and diverse RL agent architectures—without fine-tuning. This validates the effectiveness of data-driven, learned parameter update rules in improving RL robustness and adaptability under non-stationary conditions.

Technology Category

Application Category

📝 Abstract

While reinforcement learning (RL) holds great potential for decision making in the real world, it suffers from a number of unique difficulties which often need specific consideration. In particular: it is highly non-stationary; suffers from high degrees of plasticity loss; and requires exploration to prevent premature convergence to local optima and maximize return. In this paper, we consider whether learned optimization can help overcome these problems. Our method, Learned Optimization for Plasticity, Exploration and Non-stationarity (OPEN), meta-learns an update rule whose input features and output structure are informed by previously proposed solutions to these difficulties. We show that our parameterization is flexible enough to enable meta-learning in diverse learning contexts, including the ability to use stochasticity for exploration. Our experiments demonstrate that when meta-trained on single and small sets of environments, OPEN outperforms or equals traditionally used optimizers. Furthermore, OPEN shows strong generalization characteristics across a range of environments and agent architectures.

Problem

Research questions and friction points this paper is trying to address.

Overcoming non-stationarity in reinforcement learning

Addressing plasticity loss in reinforcement learning

Enhancing exploration to avoid local optima

Innovation

Methods, ideas, or system contributions that make the work stand out.

Meta-learns update rule for RL optimization

Uses stochasticity for exploration enhancement

Generalizes across diverse environments effectively

🔎 Similar Papers

No similar papers found.