Mapping and Execution of Nested Loops on Processor Arrays: CGRAs vs. TCPAs

📅 2025-02-17

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

This study addresses energy-efficient accelerator design for SoCs, systematically comparing operation-centric coarse-grained reconfigurable arrays (CGRA) and iteration-centric tensor processing arrays (TCPA) in mapping and executing nested loops. We propose, for the first time, a unified modeling framework evaluating four key dimensions: loop mapping granularity, data reuse potential, control overhead, and scalability. Our methodology integrates RTL-level modeling, polyhedral compilation, loop tiling analysis, communication complexity modeling, and cycle-accurate simulation. Experimental results demonstrate that TCPA achieves, on average, 37% lower dynamic power consumption, a 2.1× improvement in energy efficiency, and a 42% reduction in on-chip network traffic over CGRA for image processing and linear algebra workloads. These findings highlight the pronounced energy-efficiency advantages of the iteration-centric paradigm—particularly in high-dimensional nested loop scenarios—thereby providing principled insights for next-generation accelerator architecture design.

Technology Category

Application Category

📝 Abstract

Increasing demands for computing power also propel the need for energy-efficient SoC accelerator architectures. One class of such accelerators are so-called processor arrays, which typically integrate a two-dimensional mesh of interconnected processing elements~(PEs). Such arrays are specifically designed to accelerate the execution of multidimensional nested loops by exploiting the intrinsic parallelism of loops. Moreover, for mapping a given loop nest application, two opposed mapping methods have emerged: Operation-centric and iteration-centric. Both differ in the granularity of the mapping. The operation-centric approach maps individual operations to the PEs of the array, while the iteration-centric approach maps entire tiles of iterations to each PE. The operation-centric approach is applied predominantly for processor arrays often referred to as Coarse-Grained Reconfigurable Arrays~(CGRAs), while processor arrays supporting an iteration-centric approach are referred to as Tightly-Coupled Processor Arrays~(TCPAs) in the following. This work provides a comprehensive comparison of both approaches and related architectures by evaluating their respective benefits and trade-offs. ...

Problem

Research questions and friction points this paper is trying to address.

Compare CGRA and TCPA mapping methods

Evaluate energy-efficient SoC accelerators

Optimize multidimensional nested loop execution

Innovation

Methods, ideas, or system contributions that make the work stand out.

Exploits intrinsic parallelism of loops

Compares operation-centric and iteration-centric mapping

Evaluates CGRAs versus TCPAs architectures

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

Principal GPU/NPU AI System Architect

AMD

Austin, TX / San Jose, CA

Authors to Follow