Hardware-Software Collaborative Computing of Photonic Spiking Reinforcement Learning for Robotic Continuous Control

📅 2025-11-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address stringent energy-efficiency and latency requirements in robotic continuous-control tasks, this work proposes a photonic–electronic hybrid spiking reinforcement learning (RL) architecture. It pioneers the use of a programmable silicon-based Mach–Zehnder interferometer (MZI) photonic chip for continuous control, leveraging optical hardware for ultra-fast linear matrix operations while implementing spiking neural network (SNN) nonlinear activation in the electronic domain; temporal difference learning with delayed policy updates (TD3) is integrated to form a mixed-signal computational paradigm. Evaluated on HalfCheetah-v2, the system achieves a reward of 5831, reduces convergence steps by 23.3%, maintains action deviation below 2.2%, attains 1.39 TOPS/W energy efficiency, and delivers a per-step latency of only 120 ps. Key contributions include: (i) the first programmable photonic spiking RL implementation tailored for continuous control, and (ii) a novel hybrid computing paradigm featuring tight photonic–electronic co-design and joint algorithm–hardware optimization.

Technology Category

Application Category

📝 Abstract
Robotic continuous control tasks impose stringent demands on the energy efficiency and latency of computing architectures due to their high-dimensional state spaces and real-time interaction requirements. Conventional electronic computing platforms face computational bottlenecks, whereas the fusion of photonic computing and spiking reinforcement learning (RL) offers a promising alternative. Here, we propose a novel computing architecture based on photonic spiking RL, which integrates the Twin Delayed Deep Deterministic policy gradient (TD3) algorithm with spiking neural network (SNN). The proposed architecture employs an optical-electronic hybrid computing paradigm wherein a silicon photonic Mach-Zehnder interferometer (MZI) chip executes linear matrix computations, while nonlinear spiking activations are performed in the electronic domain. Experimental validation on the Pendulum-v1 and HalfCheetah-v2 benchmarks demonstrates the system capability for software-hardware co-inference, achieving a control policy reward of 5831 on HalfCheetah-v2, a 23.33% reduction in convergence steps, and an action deviation below 2.2%. Notably, this work represents the first application of a programmable MZI photonic computing chip to robotic continuous control tasks, attaining an energy efficiency of 1.39 TOPS/W and an ultralow computational latency of 120 ps. Such performance underscores the promise of photonic spiking RL for real-time decision-making in autonomous and industrial robotic systems.
Problem

Research questions and friction points this paper is trying to address.

Develops photonic spiking RL for robotic continuous control tasks
Integrates TD3 algorithm with SNN in optical-electronic hybrid architecture
Achieves high energy efficiency and low latency for real-time decision-making
Innovation

Methods, ideas, or system contributions that make the work stand out.

Photonic-electronic hybrid computing for robotic control tasks.
Programmable MZI chip enables linear matrix computations optically.
Integrates TD3 algorithm with spiking neural networks efficiently.
🔎 Similar Papers
No similar papers found.
Mengting Yu
Mengting Yu
State Key Laboratory of Integrated Service Networks, State Key Discipline Laboratory of Wide Bandgap Semiconductor Technology, Xidian University, Xi’an 710071, China
Shuiying Xiang
Shuiying Xiang
State Key Laboratory of Integrated Service Networks, State Key Discipline Laboratory of Wide Bandgap Semiconductor Technology, Xidian University, Xi’an 710071, China
C
Changjian Xie
State Key Laboratory of Integrated Service Networks, State Key Discipline Laboratory of Wide Bandgap Semiconductor Technology, Xidian University, Xi’an 710071, China
Y
Yonghang Chen
State Key Laboratory of Integrated Service Networks, State Key Discipline Laboratory of Wide Bandgap Semiconductor Technology, Xidian University, Xi’an 710071, China
Haowen Zhao
Haowen Zhao
PhD student, University of Cambridge
Biomolecular designMachine learning
X
Xingxing Guo
State Key Laboratory of Integrated Service Networks, State Key Discipline Laboratory of Wide Bandgap Semiconductor Technology, Xidian University, Xi’an 710071, China
Yahui Zhang
Yahui Zhang
State Key Laboratory of Integrated Service Networks, State Key Discipline Laboratory of Wide Bandgap Semiconductor Technology, Xidian University, Xi’an 710071, China
Y
Yanan Han
State Key Laboratory of Integrated Service Networks, State Key Discipline Laboratory of Wide Bandgap Semiconductor Technology, Xidian University, Xi’an 710071, China
Y
Yue Hao
State Key Laboratory of Integrated Service Networks, State Key Discipline Laboratory of Wide Bandgap Semiconductor Technology, Xidian University, Xi’an 710071, China