🤖 AI Summary
This work addresses the challenge that coarse-grained reconfigurable architectures (CGRAs) struggle to efficiently support compute kernels with complex control flow due to the incompatibility between their data-driven execution model and control-flow semantics. To overcome this limitation, the paper proposes a unified and retargetable compilation framework that introduces, for the first time, a purely dataflow intermediate representation based on a predicate type system. By embedding control context into data predicates, the framework systematically flattens control flow into a unified dataflow graph, effectively making control an intrinsic property of data and thereby decoupling kernel representation from the underlying hardware. Evaluated on a high-performance spatial-temporal CGRA, the approach achieves an average speedup of 2.20× on kernel benchmarks and a geometric mean speedup of 2.71× on real-world applications, outperforming the current state-of-the-art low-power solution on spatial CGRAs.
📝 Abstract
Coarse-Grained Reconfigurable Architectures (CGRAs) are a promising and versatile accelerator platform, offering a balance between the performance and efficiency of specialized accelerators and the software programmability. However, their full potential is severely hindered by control flow in accelerated kernels, as the control flow (e.g., loops, branches) is fundamentally incompatible with the parallel, data-driven CGRA fabric. Prior strategies to resolve this mismatch in CGRA kernel acceleration are either inefficient, sacrificing performance for generality, or lack generality due to the difficulty of adapting them across different execution models. Thus, a general and unified solution for efficient CGRA kernel acceleration remains elusive.
This paper introduces NEURA, a unified and retargetable compilation framework that systematically resolves the control-dataflow mismatch in CGRAs. NEURA's core innovation is a novel, pure dataflow intermediate representation (IR) built on a predicated type system. In this IR, control contexts are embedded as a predicate within each data, making control an intrinsic property of data. This mechanism enables NEURA to systematically flatten complex control flow into a single unified dataflow graph. This unified representation decouples kernel representation from hardware, empowering NEURA to retarget diverse CGRAs with different execution models and microarchitectural features. When targeted to a high-performance spatio-temporal CGRA, NEURA delivers a 2.20x speedup on kernel benchmarks and up to 2.71x geometric mean speedup on real-world applications over state-of-the-art (SOTA) high-performance baselines. It also provides a competitive solution against the SOTA low-power CGRA when retargeted to a spatial-only CGRA. NEURA is open-source and available at https://github.com/coredac/neura.