Accelerating GNN Training through Locality-aware Dropout and Merge

📅 2025-06-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address poor DRAM access locality and low bandwidth utilization caused by irregular graph structures in GNN training, this paper proposes a hardware-algorithm co-design acceleration framework. Our approach introduces two key innovations: (1) a locality-aware feature pruning mechanism that selectively discards redundant features based on hardware-aware profiling, and (2) a semantics-driven DRAM row-level memory coalescing strategy that reorganizes memory layouts according to neighborhood aggregation semantics and merges fine-grained memory accesses. Both techniques jointly enhance data locality without compromising model accuracy. Experimental evaluation demonstrates that, under a 0.5 dropout rate, our method achieves 1.48×–3.02× speedup in training throughput, reduces total DRAM traffic by 34%–55%, and decreases DRAM row activations by 59%–82% compared to baseline implementations.

Technology Category

Application Category

📝 Abstract
Graph Neural Networks (GNNs) have demonstrated significant success in graph learning and are widely adopted across various critical domains. However, the irregular connectivity between vertices leads to inefficient neighbor aggregation, resulting in substantial irregular and coarse-grained DRAM accesses. This lack of data locality presents significant challenges for execution platforms, ultimately degrading performance. While previous accelerator designs have leveraged on-chip memory and data access scheduling strategies to address this issue, they still inevitably access features at irregular addresses from DRAM. In this work, we propose LiGNN, a hardware-based solution that improves data locality by applying dropout and merge techniques during neighbor aggregation to accelerate GNN training. Unlike conventional algorithm-level dropout methods that primarily aim to improve accuracy while overlooking hardware costs, LiGNN introduces a locality-aware feature dropout mechanism. This approach selectively drops node features with data locality awareness, effectively reducing irregular DRAM accesses without compromising model accuracy. Moreover, by leveraging detailed knowledge of memory layout and organization-including critical alignment constraints-LiGNN strategically merges memory accesses during neighbor aggregation at the DRAM row level, guided by GNN-level semantics. This optimization significantly improves data locality with minimal additional cost. Under the commonly adopted 0.5 dropout rate, LiGNN outperforms state-of-the-art methods, delivering a 1.48~3.02x speedup, reducing DRAM accesses by 34%~55%, and lowering DRAM row activations by 59%~82%, all while maintaining model accuracy.
Problem

Research questions and friction points this paper is trying to address.

Improving data locality in GNN training to reduce irregular DRAM accesses
Accelerating GNN training with locality-aware dropout and merge techniques
Maintaining model accuracy while optimizing memory access patterns
Innovation

Methods, ideas, or system contributions that make the work stand out.

Locality-aware feature dropout reduces DRAM accesses
Merge memory accesses at DRAM row level
Hardware-based solution accelerates GNN training
🔎 Similar Papers
No similar papers found.
G
Gongjian Sun
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China and University of Chinese Academy of Sciences, Beijing, China
M
Mingyu Yan
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China and University of Chinese Academy of Sciences, Beijing, China
Dengke Han
Dengke Han
Institute of Computing Technology, Chinese Academy of Sciences
graph-based hardware acceleratorhigh-throughput computer architecture
Runzhen Xue
Runzhen Xue
Institute of Computing Technology, Chinese Academy of Sciences
AI for ArchitectureDesign Space ExplorationDomian Specific Accelerator
D
Duo Wang
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China and University of Chinese Academy of Sciences, Beijing, China
X
Xiaochun Ye
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China and University of Chinese Academy of Sciences, Beijing, China
Dongrui Fan
Dongrui Fan
Institute of Computing Technology, Chinese Academy of Sciences
Computer ArchitectureProcessor DesignMany-core Design