π€ AI Summary
This work addresses the limitations of conventional deep brain stimulation (DBS) systems, which rely on fixed stimulation parameters and often incur high energy consumption and adverse side effects. While existing reinforcement learning approaches offer adaptive control, their computational complexity and slow convergence hinder deployment on resource-constrained implantable devices. To overcome these challenges, the authors propose a lightweight, time- and threshold-triggered multi-armed bandit (T3P MAB) algorithm that enables joint adaptive tuning of stimulation frequency and amplitude. Notably, this is the first implementation of a multi-armed bandit framework for DBS on an embedded microcontroller. The method requires no offline training, features low computational overhead, rapid convergence, and minimal power consumption, and demonstrates superior performance over existing reinforcement learning strategies across multiple microcontroller platforms, offering a practical and efficient pathway toward intelligent optimization in implantable neuromodulation devices.
π Abstract
Deep Brain Stimulation (DBS) has proven to be a promising treatment of Parkinson's Disease (PD). DBS involves stimulating specific regions of the brain's Basal Ganglia (BG) using electric impulses to alleviate symptoms of PD such as tremors, rigidity, and bradykinesia. Although most clinical DBS approaches today use a fixed frequency and amplitude, they suffer from side effects (such as slurring of speech) and shortened battery life of the implant. Reinforcement learning (RL) approaches have been used in recent research to perform DBS in a more adaptive manner to improve overall patient outcome. These RL algorithms are, however, too complex to be trained in vivo due to their long convergence time and requirement of high computational resources. We propose a new Time&Threshold-Triggered Multi-Armed Bandit (T3P MAB) RL approach for DBS that is more effective than existing algorithms. Further, our T3P agent is lightweight enough to be deployed in the implant, unlike current deep-RL strategies, and even forgoes the need for an offline training phase. Additionally, most existing RL approaches have focused on modulating only frequency or amplitude, and the possibility of tuning them together remains greatly unexplored in the literature. Our RL agent can tune both frequency and amplitude of DBS signals to the brain with better sample efficiency and requires minimal time to converge. We implement an MAB agent for DBS for the first time on hardware to report energy measurements and prove its suitability for resource-constrained platforms. Our T3P MAB algorithm is deployed on a variety of microcontroller unit (MCU) setups to show its efficiency in terms of power consumption as opposed to other existing RL approaches used in recent work.