🤖 AI Summary
Existing DDR4 memory performance evaluation frameworks for datacenter FPGAs lack a unified benchmark platform supporting dynamic and complex memory access patterns. To address this, we design and implement a reconfigurable DDR4 performance testing platform on the AMD Kintex UltraScale+ XCKU115 FPGA. Our platform introduces a novel multi-level memory access pattern generator with runtime programmability, integrating parameterized controllers and a multi-channel parallel test architecture. It supports all three memory channels and the full speed range of 1600–2400 MT/s, enabling consistent, cross-channel and cross-rate quantification of bandwidth, latency, and burst efficiency. Experimental validation demonstrates that the platform significantly improves both accuracy and reusability in FPGA memory subsystem evaluation. By providing scalable, hardware-aware benchmarking capabilities, it serves as a foundational tool for storage optimization in high-bandwidth heterogeneous computing systems.
📝 Abstract
FPGAs are increasingly utilized in data centers due to their capacity to exploit data parallelism in computationally intensive workloads. Furthermore, the processing of modern data center workloads requires moving vast amounts of data, making it essential to optimize data exchange between FPGAs and memory. This paper introduces a novel benchmarking platform for the evaluation of DDR4 memory performance in data-center-class FPGAs. The proposed solution features highly configurable traffic generation with complex memory access patterns defined at run time and can be flexibly instantiated on the target FPGA to support multiple memory channels and varying data rates. An extensive experimental campaign, targeting the AMD Kintex UltraScale 115 FPGA and encompassing up to three memory channels with data rates ranging from 1600 to 2400 MT/s and various memory traffic configurations, demonstrates the benchmarking platform's capability to effectively evaluate DDR4 memory performance.