🤖 AI Summary
To address low parallelism, high power consumption, and imbalanced inter-core communication overhead in training Spiking Neural Networks (SNNs) on multi-core near-memory computing architectures, this paper proposes a core-mapping optimization method based on an off-policy deterministic Actor-Critic algorithm (a DDPG variant). We innovatively integrate Graph Convolutional Networks (GCNs) to model topology-aware system states and design a memory-computation balanced model partitioning strategy. A continuous-action-space discretization mechanism is introduced to formulate a multi-objective co-optimization framework targeting computational parallelism, power efficiency, and inter-core communication load. Experimental results demonstrate that the proposed method significantly reduces chip power consumption and training time, decreases average flow load by 23.6%, improves system throughput by 19.4%, and enhances communication efficiency.
📝 Abstract
With the increasing application scope of spiking neural networks (SNN), the complexity of SNN models has surged, leading to an exponential growth in demand for AI computility. As the new generation computing architecture of the neural networks, the efficiency and power consumption of distributed storage and parallel computing in the many-core near-memory computing system have attracted much attention. Among them, the mapping problem from logical cores to physical cores is one of the research hotspots. In order to improve the computing parallelism and system throughput of the many-core near-memory computing system, and to reduce power consumption, we propose a SNN training many-core deployment optimization method based on Off-policy Deterministic Actor-Critic. We utilize deep reinforcement learning as a nonlinear optimizer, treating the many-core topology as network graph features and using graph convolution to input the many-core structure into the policy network. We update the parameters of the policy network through near-end policy optimization to achieve deployment optimization of SNN models in the many-core near-memory computing architecture to reduce chip power consumption. To handle large-dimensional action spaces, we use continuous values matching the number of cores as the output of the policy network and then discretize them again to obtain new deployment schemes. Furthermore, to further balance inter-core computation latency and improve system throughput, we propose a model partitioning method with a balanced storage and computation strategy. Our method overcomes the problems such as uneven computation and storage loads between cores, and the formation of local communication hotspots, significantly reducing model training time, communication costs, and average flow load between cores in the many-core near-memory computing architecture.