Noise-based reward-modulated learning

📅 2025-03-31

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

To address the challenge of applying backpropagation-based reinforcement learning (RL) in resource-constrained and non-differentiable neural network settings, this paper proposes a gradient-free, noise-driven RL method. The approach approximates directional derivatives via stochastic neurons and couples reward prediction errors with eligibility traces to enable purely local, biologically plausible temporal credit assignment. It is the first work to integrate directional derivative theory into reward-modulated Hebbian learning (RMHL), eliminating reliance on global error signals and differentiability assumptions. Empirically, the method significantly outperforms conventional RMHL on standard RL benchmarks and matches the performance of backpropagation-based baselines, while maintaining full compatibility with neuromorphic hardware. This establishes a viable, energy-efficient learning paradigm for edge intelligence applications.

Technology Category

Application Category

📝 Abstract

Recent advances in reinforcement learning (RL) have led to significant improvements in task performance. However, training neural networks in an RL regime is typically achieved in combination with backpropagation, limiting their applicability in resource-constrained environments or when using non-differentiable neural networks. While noise-based alternatives like reward-modulated Hebbian learning (RMHL) have been proposed, their performance has remained limited, especially in scenarios with delayed rewards, which require retrospective credit assignment over time. Here, we derive a novel noise-based learning rule that addresses these challenges. Our approach combines directional derivative theory with Hebbian-like updates to enable efficient, gradient-free learning in RL. It features stochastic noisy neurons which can approximate gradients, and produces local synaptic updates modulated by a global reward signal. Drawing on concepts from neuroscience, our method uses reward prediction error as its optimization target to generate increasingly advantageous behavior, and incorporates an eligibility trace to facilitate temporal credit assignment in environments with delayed rewards. Its formulation relies on local information alone, making it compatible with implementations in neuromorphic hardware. Experimental validation shows that our approach significantly outperforms RMHL and is competitive with BP-based baselines, highlighting the promise of noise-based, biologically inspired learning for low-power and real-time applications.

Problem

Research questions and friction points this paper is trying to address.

Enables gradient-free learning in resource-constrained RL environments

Improves noise-based learning for delayed reward scenarios

Facilitates temporal credit assignment in non-differentiable networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Noise-based gradient-free learning with Hebbian updates

Stochastic neurons approximating gradients for RL

Eligibility trace enabling delayed reward credit assignment

🔎 Similar Papers

A Role of Environmental Complexity on Representation Learning in Deep Reinforcement Learning Agents