đ¤ AI Summary
Existing coarse-grained reconfigurable array (CGRA) mapping toolchains exhibit limited effectiveness and poor resource utilizationâparticularly low PE utilization (<40% on average) and high mapping failure ratesâwhen targeting multidimensional nested loops (e.g., GEMM, triangular solvers), largely due to inadequate modeling of complex, irregular computation structures.
Method: We propose a co-optimization methodology integrating loop tiling, dataflow analysis, and hardware constraint modeling, embedded within a unified measurement-simulation evaluation framework.
Contribution/Results: This work presents the first systematic, cross-toolchain assessment of CGRA mapping efficacy on representative nested-loop kernels. It reveals fundamental limitations in current toolchainsâ ability to model triangular solverâstyle control- and data-dependent loops. Experimental results demonstrate substantial improvements: near-perfect mapping success rate and doubled average PE utilization compared to state-of-the-art baselines. The study delivers a reproducible benchmark suite and concrete compiler optimization guidelines for CGRA architectureâsoftware co-design.
đ Abstract
Increasing demands for computing power also propel the need for energy-efficient SoC accelerator architectures. One class for such accelerators are so-called processor arrays, which typically integrate a two-dimensional mesh of interconnected processing elements~(PEs). Such arrays are specifically designed to accelerate the execution of multidimensional nested loops by exploiting the intrinsic parallelism of such loops. Coarse-grained reconfigurable arrays~(CGRAs) belong to this class of accelerator architectures. In this work, we analyze four toolchains for mapping loop programs onto CGRAs and compare the resulting mappings wrt. performance, i.e., latency. While most toolchains succeed in simpler kernels like general matrix multiplication, some struggle to find valid mappings for more complex loops like a triangular solver. Furthermore, we observe that the considered CGRA mappers generally tend to underutilize the available PEs.