Local Reinforcement Learning with Action-Conditioned Root Mean Squared Q-Functions

πŸ“… 2025-10-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of adapting the Forward-Forward (FF) algorithm to reinforcement learning (RL), where gradient-based credit assignment is typically indispensable. We propose ARQ, the first backpropagation-free, local RL method grounded in FF principles. ARQ introduces an action-conditioned root-mean-square Q-function as a layer-wise β€œgoodness” metric, constructs local learning signals from per-layer activation statistics, and integrates temporal-difference updates for end-to-end policy optimization. Its core contributions are threefold: (i) the first adaptation of the FF forward-forward paradigm to RL; (ii) a novel action-aware local value estimation mechanism; and (iii) complete elimination of gradient backpropagation. Empirical evaluation on MinAtar and the DeepMind Control Suite demonstrates that ARQ significantly outperforms existing backprop-free RL methods and surpasses standard backpropagation baselines on most tasks.

Technology Category

Application Category

πŸ“ Abstract
The Forward-Forward (FF) Algorithm is a recently proposed learning procedure for neural networks that employs two forward passes instead of the traditional forward and backward passes used in backpropagation. However, FF remains largely confined to supervised settings, leaving a gap at domains where learning signals can be yielded more naturally such as RL. In this work, inspired by FF's goodness function using layer activity statistics, we introduce Action-conditioned Root mean squared Q-Functions (ARQ), a novel value estimation method that applies a goodness function and action conditioning for local RL using temporal difference learning. Despite its simplicity and biological grounding, our approach achieves superior performance compared to state-of-the-art local backprop-free RL methods in the MinAtar and the DeepMind Control Suite benchmarks, while also outperforming algorithms trained with backpropagation on most tasks. Code can be found at https://github.com/agentic-learning-ai-lab/arq.
Problem

Research questions and friction points this paper is trying to address.

Extends Forward-Forward algorithm beyond supervised learning to reinforcement learning domains
Introduces action-conditioned value estimation using goodness functions for local learning
Enables biologically-plausible RL without backpropagation while matching backprop performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Action-conditioned Root mean squared Q-Functions
Applies goodness function for local reinforcement learning
Uses temporal difference learning without backpropagation
πŸ”Ž Similar Papers
No similar papers found.
F
Frank Wu
Carnegie Mellon University, New York University
Mengye Ren
Mengye Ren
NYU
Machine LearningComputer VisionArtificial Intelligence