🤖 AI Summary
To address poor DRAM access locality and low bandwidth utilization caused by irregular graph structures in GNN training, this paper proposes a hardware-algorithm co-design acceleration framework. Our approach introduces two key innovations: (1) a locality-aware feature pruning mechanism that selectively discards redundant features based on hardware-aware profiling, and (2) a semantics-driven DRAM row-level memory coalescing strategy that reorganizes memory layouts according to neighborhood aggregation semantics and merges fine-grained memory accesses. Both techniques jointly enhance data locality without compromising model accuracy. Experimental evaluation demonstrates that, under a 0.5 dropout rate, our method achieves 1.48×–3.02× speedup in training throughput, reduces total DRAM traffic by 34%–55%, and decreases DRAM row activations by 59%–82% compared to baseline implementations.
📝 Abstract
Graph Neural Networks (GNNs) have demonstrated significant success in graph learning and are widely adopted across various critical domains. However, the irregular connectivity between vertices leads to inefficient neighbor aggregation, resulting in substantial irregular and coarse-grained DRAM accesses. This lack of data locality presents significant challenges for execution platforms, ultimately degrading performance. While previous accelerator designs have leveraged on-chip memory and data access scheduling strategies to address this issue, they still inevitably access features at irregular addresses from DRAM. In this work, we propose LiGNN, a hardware-based solution that improves data locality by applying dropout and merge techniques during neighbor aggregation to accelerate GNN training. Unlike conventional algorithm-level dropout methods that primarily aim to improve accuracy while overlooking hardware costs, LiGNN introduces a locality-aware feature dropout mechanism. This approach selectively drops node features with data locality awareness, effectively reducing irregular DRAM accesses without compromising model accuracy. Moreover, by leveraging detailed knowledge of memory layout and organization-including critical alignment constraints-LiGNN strategically merges memory accesses during neighbor aggregation at the DRAM row level, guided by GNN-level semantics. This optimization significantly improves data locality with minimal additional cost. Under the commonly adopted 0.5 dropout rate, LiGNN outperforms state-of-the-art methods, delivering a 1.48~3.02x speedup, reducing DRAM accesses by 34%~55%, and lowering DRAM row activations by 59%~82%, all while maintaining model accuracy.