Learning to Control Dynamical Agents via Spiking Neural Networks and Metropolis-Hastings Sampling

📅 2025-07-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of gradient-based optimization in spiking neural networks (SNNs) for reinforcement learning—stemming from the non-differentiability of spike events—this paper proposes a gradient-free training framework based on Metropolis-Hastings (M-H) sampling, marking the first integration of Bayesian sampling into dynamic surrogate control of SNNs. Eschewing backpropagation, the method updates parameters solely via sparse reward signals, ensuring native compatibility with neuromorphic hardware. Evaluated on AcroBot and CartPole benchmarks, it achieves superior cumulative rewards with fewer training episodes and smaller network architectures compared to deep Q-networks and state-of-the-art SNN-based RL approaches. Key contributions are: (1) establishing the first M-H-based gradient-free optimization paradigm tailored for SNN reinforcement learning; and (2) enabling end-to-end policy learning deployable on brain-inspired hardware.

Technology Category

Application Category

📝 Abstract
Spiking Neural Networks (SNNs) offer biologically inspired, energy-efficient alternatives to traditional Deep Neural Networks (DNNs) for real-time control systems. However, their training presents several challenges, particularly for reinforcement learning (RL) tasks, due to the non-differentiable nature of spike-based communication. In this work, we introduce what is, to our knowledge, the first framework that employs Metropolis-Hastings (MH) sampling, a Bayesian inference technique, to train SNNs for dynamical agent control in RL environments without relying on gradient-based methods. Our approach iteratively proposes and probabilistically accepts network parameter updates based on accumulated reward signals, effectively circumventing the limitations of backpropagation while enabling direct optimization on neuromorphic platforms. We evaluated this framework on two standard control benchmarks: AcroBot and CartPole. The results demonstrate that our MH-based approach outperforms conventional Deep Q-Learning (DQL) baselines and prior SNN-based RL approaches in terms of maximizing the accumulated reward while minimizing network resources and training episodes.
Problem

Research questions and friction points this paper is trying to address.

Train SNNs for RL without gradient-based methods
Overcome non-differentiable spike communication challenges
Optimize dynamical agent control on neuromorphic platforms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Spiking Neural Networks for control
Applies Metropolis-Hastings sampling training
Optimizes rewards without gradient methods