Mapping and Execution of Nested Loops on Processor Arrays: CGRAs vs. TCPAs

📅 2025-02-17
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
This study addresses energy-efficient accelerator design for SoCs, systematically comparing operation-centric coarse-grained reconfigurable arrays (CGRA) and iteration-centric tensor processing arrays (TCPA) in mapping and executing nested loops. We propose, for the first time, a unified modeling framework evaluating four key dimensions: loop mapping granularity, data reuse potential, control overhead, and scalability. Our methodology integrates RTL-level modeling, polyhedral compilation, loop tiling analysis, communication complexity modeling, and cycle-accurate simulation. Experimental results demonstrate that TCPA achieves, on average, 37% lower dynamic power consumption, a 2.1× improvement in energy efficiency, and a 42% reduction in on-chip network traffic over CGRA for image processing and linear algebra workloads. These findings highlight the pronounced energy-efficiency advantages of the iteration-centric paradigm—particularly in high-dimensional nested loop scenarios—thereby providing principled insights for next-generation accelerator architecture design.

Technology Category

Application Category

📝 Abstract
Increasing demands for computing power also propel the need for energy-efficient SoC accelerator architectures. One class of such accelerators are so-called processor arrays, which typically integrate a two-dimensional mesh of interconnected processing elements~(PEs). Such arrays are specifically designed to accelerate the execution of multidimensional nested loops by exploiting the intrinsic parallelism of loops. Moreover, for mapping a given loop nest application, two opposed mapping methods have emerged: Operation-centric and iteration-centric. Both differ in the granularity of the mapping. The operation-centric approach maps individual operations to the PEs of the array, while the iteration-centric approach maps entire tiles of iterations to each PE. The operation-centric approach is applied predominantly for processor arrays often referred to as Coarse-Grained Reconfigurable Arrays~(CGRAs), while processor arrays supporting an iteration-centric approach are referred to as Tightly-Coupled Processor Arrays~(TCPAs) in the following. This work provides a comprehensive comparison of both approaches and related architectures by evaluating their respective benefits and trade-offs. ...
Problem

Research questions and friction points this paper is trying to address.

Compare CGRA and TCPA mapping methods
Evaluate energy-efficient SoC accelerators
Optimize multidimensional nested loop execution
Innovation

Methods, ideas, or system contributions that make the work stand out.

Exploits intrinsic parallelism of loops
Compares operation-centric and iteration-centric mapping
Evaluates CGRAs versus TCPAs architectures
🔎 Similar Papers
No similar papers found.
D
Dominik Walter
Hardware/Software Co-Design, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nßrnberg (FAU), Germany
M
Marita Halm
Hardware/Software Co-Design, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nßrnberg (FAU), Germany
D
Daniel Seidel
Hardware/Software Co-Design, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nßrnberg (FAU), Germany
I
Indrayudh Ghosh
Hardware/Software Co-Design, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nßrnberg (FAU), Germany
C
Christian Heidorn
Hardware/Software Co-Design, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nßrnberg (FAU), Germany
Frank Hannig
Frank Hannig
Friedrich-Alexander-Universität Erlangen-Nßrnberg (FAU)
Embedded SystemsHardware/Software Co-DesignDomain-specific ComputingParallelizationHigh
JĂźrgen Teich
JĂźrgen Teich
Full Professor, Computer Science, Friedrich-Alexander-Universität Erlangen-Nßrnberg
embedded systemshardware/software co-designreconfigurable computing