Dissecting CPU-GPU Unified Physical Memory on AMD MI300A APUs

📅 2025-08-18

📈 Citations: 0

✨ Influential: 0

career value

251K/year

🤖 AI Summary

Traditional discrete GPU architectures in HPC and data centers suffer from memory separation, leading to complex memory management and substantial performance overhead. Method: This work presents the first systematic characterization of AMD MI300A’s CPU–GPU Unified Physical Memory (UPM) architecture, employing microbenchmarks and application-level evaluation to quantify memory latency, bandwidth, coherence overhead, TLB behavior, page fault handling, Infinity Cache utilization, and unified memory allocation mechanisms. It further proposes UPM-aware system software optimizations and application migration methodologies. Contribution/Results: Experiments demonstrate that UPM significantly reduces memory management overhead—cutting memory-related costs by up to 44%. Across diverse HPC workloads, the unified memory model achieves performance on par with—or even exceeding—that of explicit memory management. These findings validate UPM as a viable, high-energy-efficiency memory paradigm for heterogeneous computing.

Technology Category

Application Category

📝 Abstract

Discrete GPUs are a cornerstone of HPC and data center systems, requiring management of separate CPU and GPU memory spaces. Unified Virtual Memory (UVM) has been proposed to ease the burden of memory management; however, at a high cost in performance. The recent introduction of AMD's MI300A Accelerated Processing Units (APUs)--as deployed in the El Capitan supercomputer--enables HPC systems featuring integrated CPU and GPU with Unified Physical Memory (UPM) for the first time. This work presents the first comprehensive characterization of the UPM architecture on MI300A. We first analyze the UPM system properties, including memory latency, bandwidth, and coherence overhead. We then assess the efficiency of the system software in memory allocation, page fault handling, TLB management, and Infinity Cache utilization. We propose a set of porting strategies for transforming applications for the UPM architecture and evaluate six applications on the MI300A APU. Our results show that applications on UPM using the unified memory model can match or outperform those in the explicitly managed model--while reducing memory costs by up to 44%.

Problem

Research questions and friction points this paper is trying to address.

Analyze AMD MI300A APU's Unified Physical Memory performance

Evaluate system software efficiency in memory management

Develop strategies for optimizing applications on UPM

Innovation

Methods, ideas, or system contributions that make the work stand out.

AMD MI300A APUs enable CPU-GPU Unified Physical Memory

Comprehensive characterization of UPM architecture performance

Unified memory model reduces costs by up to 44%

🔎 Similar Papers

No similar papers found.