Real-Time Recurrent Reinforcement Learning

📅 2023-11-08
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF

career value

207K/year
🤖 AI Summary
To address the challenges of real-time policy learning and limited biological interpretability in partially observable Markov decision processes (POMDPs), this paper proposes a biologically inspired reinforcement learning framework. Methodologically, it adopts the basal ganglia as a neuroanatomical computational blueprint and integrates meta-reinforcement learning, temporal-difference (TD) learning, and eligibility-trace-driven value/policy optimization within a shared recurrent neural network (RNN) architecture that enables online automatic differentiation for gradient updates. This is the first framework to jointly unify neuroanatomically grounded modeling, biologically plausible TD mechanisms, and differentiable real-time learning in a single architecture. Empirical evaluation across multiple POMDP benchmarks demonstrates robust solution capability, rapid online adaptation, and neuroscientifically reasonable representation of learning dynamics. The work establishes a novel paradigm bridging cognitive interpretability and brain-inspired intelligence, offering both theoretical rigor and practical engineering efficacy.
📝 Abstract
We introduce a biologically plausible RL framework for solving tasks in partially observable Markov decision processes (POMDPs). The proposed algorithm combines three integral parts: (1) A Meta-RL architecture, resembling the mammalian basal ganglia; (2) A biologically plausible reinforcement learning algorithm, exploiting temporal difference learning and eligibility traces to train the policy and the value-function; (3) An online automatic differentiation algorithm for computing the gradients with respect to parameters of a shared recurrent network backbone. Our experimental results show that the method is capable of solving a diverse set of partially observable reinforcement learning tasks. The algorithm we call real-time recurrent reinforcement learning (RTRRL) serves as a model of learning in biological neural networks, mimicking reward pathways in the basal ganglia.
Problem

Research questions and friction points this paper is trying to address.

Develops a biologically plausible RL framework for POMDPs.
Combines Meta-RL, temporal difference learning, and online differentiation.
Solves diverse partially observable reinforcement learning tasks.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Meta-RL architecture mimicking basal ganglia
Biologically plausible RL with temporal difference
Online automatic differentiation for recurrent networks
🔎 Similar Papers
No similar papers found.