Gem5-AcceSys: Enabling System-Level Exploration of Standard Interconnects for Novel Accelerators

📅 2025-02-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the insufficient co-optimization between standard interconnects (e.g., PCIe) and heterogeneous memory technologies (DDR4/5, GDDR6, HBM2) in AI accelerators (GPUs/dataflow architectures). We propose a system-level modeling and evaluation methodology built upon Gem5, introducing the first configurable, joint memory hierarchy–interconnect modeling framework. It integrates the PCIe transaction layer, multiple memory controller interfaces, and Transformer-specific matrix multiplication workloads. Our analysis reveals— for the first time—that device-side memory is not strictly necessary: through interconnect-depth optimization, up to 80% of the bandwidth efficiency attainable with device-side memory can be achieved using host-side memory alone; in certain configurations, host-side memory even outperforms device-side alternatives. These findings provide quantifiable, system-level design guidelines for interconnect–memory trade-offs, enabling cost-effective AI accelerator system architectures.

Technology Category

Application Category

📝 Abstract
The growing demand for efficient, high-performance processing in machine learning (ML) and image processing has made hardware accelerators, such as GPUs and Data Streaming Accelerators (DSAs), increasingly essential. These accelerators enhance ML and image processing tasks by offloading computation from the CPU to dedicated hardware. These accelerators rely on interconnects for efficient data transfer, making interconnect design crucial for system-level performance. This paper introduces Gem5-AcceSys, an innovative framework for system-level exploration of standard interconnects and configurable memory hierarchies. Using a matrix multiplication accelerator tailored for transformer workloads as a case study, we evaluate PCIe performance across diverse memory types (DDR4, DDR5, GDDR6, HBM2) and configurations, including host-side and device-side memory. Our findings demonstrate that optimized interconnects can achieve up to 80% of device-side memory performance and, in some scenarios, even surpass it. These results offer actionable insights for system architects, enabling a balanced approach to performance and cost in next-generation accelerator design.
Problem

Research questions and friction points this paper is trying to address.

System-level exploration of interconnects
Optimization for hardware accelerators
Performance evaluation across memory types
Innovation

Methods, ideas, or system contributions that make the work stand out.

Gem5-AcceSys framework
Explores standard interconnects
Optimizes memory hierarchies
🔎 Similar Papers
No similar papers found.