AutoGNN: End-to-End Hardware-Driven Graph Preprocessing for Enhanced GNN Performance

📅 2026-01-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the performance bottleneck caused by preprocessing in graph neural network (GNN) inference, which often dominates overall latency. To this end, the authors propose AutoGNN, an end-to-end FPGA-based hardware accelerator that efficiently executes preprocessing tasks such as graph transformation and sampling through a reconfigurable architecture. The key innovation lies in the co-design of a Unified Processing Element (UPE) and a Single-Cycle Reducer (SCR), which synergistically integrates parallel and sequential computation while enabling runtime dynamic reconfiguration to adapt to diverse graph structures. Implemented on a 7nm enterprise-grade FPGA, the system achieves up to 9.0× and 2.1× speedup in preprocessing over conventional CPU and GPU baselines, respectively.

Technology Category

Application Category

📝 Abstract
Graph neural network (GNN) inference faces significant bottlenecks in preprocessing, which often dominate overall inference latency. We introduce AutoGNN, an FPGA-based accelerator designed to address these challenges by leveraging FPGA's reconfigurability and specialized components. AutoGNN adapts to diverse graph inputs, efficiently performing computationally intensive tasks such as graph conversion and sampling. By utilizing components like adder trees, AutoGNN executes reduction operations in constant time, overcoming the limitations of serialization and synchronization on GPUs. AutoGNN integrates unified processing elements (UPEs) and single-cycle reducers (SCRs) to streamline GNN preprocessing. UPEs enable scalable parallel processing for edge sorting and unique vertex selection, while SCRs efficiently handle sequential tasks such as pointer array construction and subgraph reindexing. A user-level software framework dynamically profiles graph inputs, determines optimal configurations, and reprograms AutoGNN to handle varying workloads. Implemented on a 7$n$m enterprise FPGA, AutoGNN achieves up to 9.0$\times$ and 2.1$\times$ speedup compared to conventional and GPU-accelerated preprocessing systems, respectively, enabling high-performance GNN preprocessing across diverse datasets.
Problem

Research questions and friction points this paper is trying to address.

Graph Neural Network
Preprocessing Bottleneck
Inference Latency
Graph Processing
Hardware Acceleration
Innovation

Methods, ideas, or system contributions that make the work stand out.

FPGA acceleration
Graph Neural Networks
Hardware-Driven Preprocessing
Unified Processing Elements
Single-Cycle Reducers
🔎 Similar Papers
No similar papers found.
Seungkwan Kang
Seungkwan Kang
Graduate Student of Electrical Engineering (EE), KAIST
Computer Architecture
S
Seungjun Lee
Computer Architecture and Memory Systems Laboratory, KAIST
D
Donghyun Gouk
Panmnesia, Inc.
M
Miryeong Kwon
Panmnesia, Inc.
H
Hyunkyu Choi
Panmnesia, Inc.
J
Junhyeok Jang
Panmnesia, Inc.
S
Sangwon Lee
Panmnesia, Inc.
H
Huiwon Choi
Computer Architecture and Memory Systems Laboratory, KAIST
Jie Zhang
Jie Zhang
Assistant Professor, School of Computer Science, Peking University
Computer ArchitectureStorage SystemGPU
W
Wonil Choi
Hanyang University
M
Mahmut Taylan Kandemir
Pennsylvania State University
Myoungsoo Jung
Myoungsoo Jung
The KAIST Endowed Chair Professor | Full Professor, Department of Electrical Engineering, KAIST
Computer ArchitectureSolid State DriveNon-Volatile MemoryCXLOperating Systems