CODO: An Automated Compiler for Comprehensive Dataflow Optimization

📅 2026-04-14
📈 Citations: 0
Influential: 0
📄 PDF

career value

215K/year
🤖 AI Summary
Manually constructing efficient dataflow architectures on FPGAs faces significant challenges in balancing performance and resource utilization, a problem that persists even when using high-level synthesis (HLS) tools. This work proposes CODO, the first end-to-end compiler capable of automatically detecting and repairing dataflow violations across multiple granularities while jointly optimizing on-chip and off-chip data movement and generating efficient schedules. By integrating dataflow compliance analysis, communication optimization, and automated scheduling, CODO achieves 1.45–4.52× latency speedup on representative compute kernels and delivers 3.7–33.8× acceleration on DNN models. Board-level evaluations demonstrate an average 7.3× speedup for CNNs and a 2.07× speedup for GPT-2, substantially outperforming existing frameworks.

Technology Category

Application Category

📝 Abstract
FPGAs are well-suited for dataflow architectures that process data in a streaming or pipelined manner, thus satisfying the high computational and communication demands of emerging applications. However, manually implementing an efficient dataflow architecture for large-scale applications is still challenging, even for specialists who use high-level synthesis (HLS) to simplify FPGA programming. To address this, we introduce CODO, an automated compiler that generates feasible and efficient dataflow accelerators on FPGAs. CODO features a systematic method for detecting and eliminating both coarse-grained and fine-grained dataflow violations. Building on this, CODO performs both on- and off-chip data movement optimizations to maximize transfer efficiency. To guarantee a higher design quality, CODO performs automatic scheduling to generate high-performance dataflow accelerators, ensuring a balanced performance-resource trade-off. Synthesis results show that CODO delivers $1.45\times$ to $4.52\times$ latency speedups on typical computation kernels and $3.7\times$ to $33.8\times$ speedups on DNN models compared to SOTA frameworks. In on-board evaluations, CODO achieves $7.3\times$ average speedup on CNN models and $2.07\times$ average speedup on the GPT-2 model over SOTA frameworks. The compiler is open-sourced at https://github.com/sjtu-zhao-lab/codo-artifact.
Problem

Research questions and friction points this paper is trying to address.

FPGA
dataflow optimization
automated compiler
high-level synthesis
data movement
Innovation

Methods, ideas, or system contributions that make the work stand out.

dataflow optimization
FPGA compiler
high-level synthesis
automated scheduling
data movement optimization
W
Weichuang Zhang
School of Computer Science, Shanghai Jiao Tong University, Shanghai, China
Y
Yiquan Wang
School of Computer Science, Shanghai Jiao Tong University, Shanghai, China
X
Xinzhou Zhang
School of Computer Science, Shanghai Jiao Tong University, Shanghai, China
C
Chi Zhang
School of Computer Science, Shanghai Jiao Tong University, Shanghai, China
Yu Feng
Yu Feng
Shanghai Jiao Tong University
Computer Architecture
X
Xiaofeng Hou
School of Computer Science, Shanghai Jiao Tong University, Shanghai, China
Chao Li
Chao Li
Professor, Shanghai Jiao Tong University
Computer ArchitectureCloud/Edge ComputingAutonomous SystemsIT Sustainability
Jieru Zhao
Jieru Zhao
Associate Professor, Shanghai Jiao Tong University
Hardware-software co-designAI acceleration and systemCompilerFPGAHigh-level synthesis
Minyi Guo
Minyi Guo
IEEE Fellow, Chair Professor, Shanghai Jiao Tong University
Parallel ComputingCompiler OptimizationCloud ComputingNetworkingBig Data