Meta-Learning to Explore via Memory Density Feedback

📅 2025-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of efficient intra-episode exploration of unseen states in reinforcement learning. We propose a meta-learning-driven active exploration framework that, for the first time, integrates nonparametric memory-based density estimation with meta-learning to construct a dynamic memory model. Intrinsic rewards are derived from the low probability density of novel observations under this model, guiding a recurrent policy network to optimize exploration behavior online within a single episode. Our contributions are threefold: (1) unsupervised, online exploration policy adaptation via density-based feedback; (2) elimination of reliance on environmental priors, state visitation counts, or prediction-error signals; and (3) zero-shot transferability to novel sparse-reward tasks. Experiments demonstrate significant improvements in both exploration efficiency and generalization over conventional exploration methods.

Technology Category

Application Category

📝 Abstract
Exploration algorithms for reinforcement learning typically replace or augment the reward function with an additional ``intrinsic'' reward that trains the agent to seek previously unseen states of the environment. Here, we consider an exploration algorithm that exploits meta-learning, or learning to learn, such that the agent learns to maximize its exploration progress within a single episode, even between epochs of training. The agent learns a policy that aims to minimize the probability density of new observations with respect to all of its memories. In addition, it receives as feedback evaluations of the current observation density and retains that feedback in a recurrent network. By remembering trajectories of density, the agent learns to navigate a complex and growing landscape of familiarity in real-time, allowing it to maximize its exploration progress even in completely novel states of the environment for which its policy has not been trained.
Problem

Research questions and friction points this paper is trying to address.

Develops meta-learning for reinforcement learning exploration.
Maximizes exploration progress within single episodes.
Uses memory density feedback to navigate novel states.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Meta-learning enhances exploration in reinforcement learning.
Agent minimizes new observation probability density using memory.
Recurrent network retains feedback for real-time navigation.
🔎 Similar Papers
No similar papers found.