đ¤ AI Summary
This study addresses energy-efficient accelerator design for SoCs, systematically comparing operation-centric coarse-grained reconfigurable arrays (CGRA) and iteration-centric tensor processing arrays (TCPA) in mapping and executing nested loops. We propose, for the first time, a unified modeling framework evaluating four key dimensions: loop mapping granularity, data reuse potential, control overhead, and scalability. Our methodology integrates RTL-level modeling, polyhedral compilation, loop tiling analysis, communication complexity modeling, and cycle-accurate simulation. Experimental results demonstrate that TCPA achieves, on average, 37% lower dynamic power consumption, a 2.1Ă improvement in energy efficiency, and a 42% reduction in on-chip network traffic over CGRA for image processing and linear algebra workloads. These findings highlight the pronounced energy-efficiency advantages of the iteration-centric paradigmâparticularly in high-dimensional nested loop scenariosâthereby providing principled insights for next-generation accelerator architecture design.
đ Abstract
Increasing demands for computing power also propel the need for energy-efficient SoC accelerator architectures. One class of such accelerators are so-called processor arrays, which typically integrate a two-dimensional mesh of interconnected processing elements~(PEs). Such arrays are specifically designed to accelerate the execution of multidimensional nested loops by exploiting the intrinsic parallelism of loops. Moreover, for mapping a given loop nest application, two opposed mapping methods have emerged: Operation-centric and iteration-centric. Both differ in the granularity of the mapping. The operation-centric approach maps individual operations to the PEs of the array, while the iteration-centric approach maps entire tiles of iterations to each PE. The operation-centric approach is applied predominantly for processor arrays often referred to as Coarse-Grained Reconfigurable Arrays~(CGRAs), while processor arrays supporting an iteration-centric approach are referred to as Tightly-Coupled Processor Arrays~(TCPAs) in the following. This work provides a comprehensive comparison of both approaches and related architectures by evaluating their respective benefits and trade-offs. ...