NEURA: A Unified and Retargetable Compilation Framework for Coarse-Grained Reconfigurable Architectures

📅 2026-04-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge that coarse-grained reconfigurable architectures (CGRAs) struggle to efficiently support compute kernels with complex control flow due to the incompatibility between their data-driven execution model and control-flow semantics. To overcome this limitation, the paper proposes a unified and retargetable compilation framework that introduces, for the first time, a purely dataflow intermediate representation based on a predicate type system. By embedding control context into data predicates, the framework systematically flattens control flow into a unified dataflow graph, effectively making control an intrinsic property of data and thereby decoupling kernel representation from the underlying hardware. Evaluated on a high-performance spatial-temporal CGRA, the approach achieves an average speedup of 2.20× on kernel benchmarks and a geometric mean speedup of 2.71× on real-world applications, outperforming the current state-of-the-art low-power solution on spatial CGRAs.
📝 Abstract
Coarse-Grained Reconfigurable Architectures (CGRAs) are a promising and versatile accelerator platform, offering a balance between the performance and efficiency of specialized accelerators and the software programmability. However, their full potential is severely hindered by control flow in accelerated kernels, as the control flow (e.g., loops, branches) is fundamentally incompatible with the parallel, data-driven CGRA fabric. Prior strategies to resolve this mismatch in CGRA kernel acceleration are either inefficient, sacrificing performance for generality, or lack generality due to the difficulty of adapting them across different execution models. Thus, a general and unified solution for efficient CGRA kernel acceleration remains elusive. This paper introduces NEURA, a unified and retargetable compilation framework that systematically resolves the control-dataflow mismatch in CGRAs. NEURA's core innovation is a novel, pure dataflow intermediate representation (IR) built on a predicated type system. In this IR, control contexts are embedded as a predicate within each data, making control an intrinsic property of data. This mechanism enables NEURA to systematically flatten complex control flow into a single unified dataflow graph. This unified representation decouples kernel representation from hardware, empowering NEURA to retarget diverse CGRAs with different execution models and microarchitectural features. When targeted to a high-performance spatio-temporal CGRA, NEURA delivers a 2.20x speedup on kernel benchmarks and up to 2.71x geometric mean speedup on real-world applications over state-of-the-art (SOTA) high-performance baselines. It also provides a competitive solution against the SOTA low-power CGRA when retargeted to a spatial-only CGRA. NEURA is open-source and available at https://github.com/coredac/neura.
Problem

Research questions and friction points this paper is trying to address.

Coarse-Grained Reconfigurable Architectures
control flow
dataflow
compilation framework
kernel acceleration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Coarse-Grained Reconfigurable Architecture
Dataflow Intermediate Representation
Predicated Type System
Control-Dataflow Decoupling
Retargetable Compilation
🔎 Similar Papers
No similar papers found.
S
Shangkun Li
The Hong Kong University of Science and Technology, Hong Kong
J
Jinming Ge
The Hong Kong University of Science and Technology, Hong Kong
D
Diyuan Tao
Independent Researcher, China
Zeyu Li
Zeyu Li
Hong Kong University of Science and Technology(Guang Zhou)
GPUHigh Performance Compute
J
Jiawei Liang
The Hong Kong University of Science and Technology, Hong Kong
L
Linfeng Du
The Hong Kong University of Science and Technology, Hong Kong
Jiang Xu
Jiang Xu
Hong Kong University of Science and Technology (Guangzhou)
MPSoCNetwork on ChipHW/SW-CodesignOptical Neural Network
Wei Zhang
Wei Zhang
Hong Kong University of Science and Technology
Embedded systemReconfigurable computingMulticore systemNanoelectronics
Cheng Tan
Cheng Tan
Google, Arizona State University
Computer Architecture