CMOS+X: Stacking Persistent Embedded Memories based on Oxide Transistors upon GPGPU Platforms

📅 2025-06-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
GPGPU performance is constrained by bandwidth and capacity bottlenecks in register files and last-level caches; conventional SRAM faces diminishing returns due to stalled process scaling and high static power, limiting computational density gains. This work proposes integrating amorphous oxide semiconductor (AOS) transistors into capacitive, persistent memory structures—including 1T1C eDRAM and 2T0C/3T0C gain cells—to realize low-leakage, high-density, multi-port embedded memory. Fabricated via back-end-of-line (BEOL) monolithic 3D integration, the design enables co-optimization of energy efficiency and memory bandwidth. Its key innovation lies in leveraging AOS’s ultralow off-state current to implement compact, multi-read-port gain cells, thereby overcoming fundamental scaling limits of SRAM. Evaluation on an Ampere-architecture GPU demonstrates a 5.2× improvement in peak performance-per-watt, an 8% increase in average instructions-per-cycle (IPC), and support for larger warp sizes and higher core counts.

Technology Category

Application Category

📝 Abstract
In contemporary general-purpose graphics processing units (GPGPUs), the continued increase in raw arithmetic throughput is constrained by the capabilities of the register file (single-cycle) and last-level cache (high bandwidth), which require the delivery of operands at a cadence demanded by wide single-instruction multiple-data (SIMD) lanes. Enhancing the capacity, density, or bandwidth of these memories can unlock substantial performance gains; however, the recent stagnation of SRAM bit-cell scaling leads to inequivalent losses in compute density. To address the challenges posed by SRAM's scaling and leakage power consumption, this paper explores the potential CMOS+X integration of amorphous oxide semiconductor (AOS) transistors in capacitive, persistent memory topologies (e.g., 1T1C eDRAM, 2T0C/3T0C Gain Cell) as alternative cells in multi-ported and high-bandwidth banked GPGPU memories. A detailed study of the density and energy tradeoffs of back-end-of-line (BEOL) integrated memories utilizing monolithic 3D (M3D)-integrated multiplexed arrays is conducted, while accounting for the macro-level limitations of integrating AOS candidate structures proposed by the device community (an aspect often overlooked in prior work). By exploiting the short lifetime of register operands, we propose a multi-ported AOS gain-cell capable of delivering 3x the read ports in ~76% of the footprint of SRAM with over 70% lower standby power, enabling enhancements to compute capacity, such as larger warp sizes or processor counts. Benchmarks run on a validated NVIDIA Ampere-class GPU model, using a modified version of Accel-Sim, demonstrate improvements of up to 5.2x the performance per watt and an average 8% higher geometric mean instruction per cycle (IPC) on various compute- and memory-bound tasks.
Problem

Research questions and friction points this paper is trying to address.

Address SRAM scaling and leakage power limitations in GPGPUs
Explore oxide semiconductor transistors for high-bandwidth memory alternatives
Enhance compute capacity and energy efficiency in GPU architectures
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrating AOS transistors in GPGPU memories
Monolithic 3D multiplexed arrays for density
Multi-ported AOS gain-cell reduces power
🔎 Similar Papers
No similar papers found.