🤖 AI Summary
Synaptic credit assignment—the biologically plausible and efficient allocation of learning signals to individual synapses—remains a fundamental challenge in neural network learning.
Method: Inspired by neuromodulatory reinforcement learning, we propose Dopamine, a gradient-free optimizer that replaces backpropagation’s global gradient computation and weight transmission with local, asynchronous synaptic updates driven by reward prediction error (RPE). Dopamine employs stochastic weight perturbations and an RPE-guided regret minimization rule to adaptively adjust synaptic weights without requiring derivative information.
Contribution/Results: By eliminating gradient-based computational redundancy, memory bottlenecks, and weight locking, Dopamine significantly enhances biological plausibility while enabling efficient, decentralized learning. In benchmark tasks—including XOR and chaotic time-series prediction—it converges faster than standard perturbation methods and achieves performance comparable to gradient-based algorithms, yet with substantially reduced computational and memory overhead.
📝 Abstract
Solving the synaptic Credit Assignment Problem(CAP) is central to learning in both biological and artificial neural systems. Finding an optimal solution for synaptic CAP means setting the synaptic weights that assign credit to each neuron for influencing the final output and behavior of neural networks or animals. Gradient-based methods solve this problem in artificial neural networks using back-propagation, however, not in the most efficient way. For instance, back-propagation requires a chain of top-down gradient computations. This leads to an expensive optimization process in terms of computing power and memory linked with well-known weight transport and update locking problems. To address these shortcomings, we take a NeuroAI approach and draw inspiration from neural Reinforcement Learning to develop a derivative-free optimizer for training neural networks, Dopamine. Dopamine is developed for Weight Perturbation (WP) learning that exploits stochastic updating of weights towards optima. It achieves this by minimizing the regret, a form of Reward Prediction Error (RPE) between the expected outcome from the perturbed model and the actual outcome from the unperturbed model. We use this RPE to adjust the learning rate in the network (i.e., creating an adaptive learning rate strategy, similar to the role of dopamine in the brain). We tested the Dopamine optimizer for training multi-layered perceptrons for XOR tasks, and recurrent neural networks for chaotic time series forecasting. Dopamine-trained models demonstrate accelerated convergence and outperform standard WP, and give comparable performance to gradient-based algorithms, while consuming significantly less computation and memory. Overall, the Dopamine optimizer not only finds robust solutions and comparable performance to the state-of-the-art Machine Learning optimizers but is also neurobiologically more plausible.