A Full-Stack Performance Evaluation Infrastructure for 3D-DRAM-based LLM Accelerators

📅 2026-04-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 3D-DRAM accelerators lack open-source, general-purpose full-stack evaluation tools capable of supporting diverse LLM inference scenarios. To address this gap, this work proposes ATLAS—the first full-stack simulation framework built upon commercial hybrid-bonding 3D-DRAM technology. ATLAS enables high-fidelity performance modeling for arbitrary LLM decoding workloads through a unified system architecture and abstracted programming primitives. It is the first silicon-validated, open-source simulation platform for 3D-DRAM-based LLM acceleration, offering broad generality and reusability. Empirical validation demonstrates that ATLAS achieves simulation errors of no more than 8.57% and exhibits strong correlation (97.26%–99.96%) with real hardware measurements, making it an effective tool for guiding co-design of memory systems and microarchitectures.
📝 Abstract
Large language models (LLMs) exhibit memory-intensive behavior during decoding, making it a key bottleneck in LLM inference. To accelerate decoding execution, hybrid-bonding-based 3D-DRAM has been adopted in LLM accelerators. While this emerging technology provides strong performance gains over existing hardware, current 3D-DRAM accelerators (3D-Accelerators) rely on closed-source evaluation tools, limiting access to publicly available performance analysis methods. Moreover, existing designs are highly customized for specific scenarios, lacking a general and reusable full-stack modeling for 3D-Accelerators across diverse usecases. To bridge this fundamental gap, we present ATLAS, the first silicon-proven Architectural Three-dimesional-DRAM-based LLM Accelerator Simulation framework. Built on commercially deployed multi-layer 3D-DRAM technology, ATLAS introduces unified abstractions for both 3D-Accelerator system architecture and programming primitives to support arbitrary LLM inference scenarios. Validation against real silicon shows that ATLAS achieves $\le$8.57% simulation error and 97.26-99.96\% correlation with measured performance. Through design space exploration with ATLAS, we demonstrate its ability to guide architecture design and distill key takeaways for both 3D-DRAM memory system and 3D-Accelerator microarchitecture across scenarios. ATLAS will be open-sourced upon publication, enabling further research on 3D-Accelerators.
Problem

Research questions and friction points this paper is trying to address.

3D-DRAM
LLM accelerators
performance evaluation
full-stack modeling
memory-intensive decoding
Innovation

Methods, ideas, or system contributions that make the work stand out.

3D-DRAM
LLM accelerator
full-stack simulation
architectural modeling
performance evaluation
🔎 Similar Papers
No similar papers found.