🤖 AI Summary
Current Processing-in-Memory (PIM) architectures lack efficient support for dynamic memory allocation. To address this gap, this paper introduces the first high-performance dynamic memory allocator designed specifically for real-world PIM hardware. Our approach employs design-space exploration to jointly optimize metadata layout and partition placement, establishing a scalable, hardware-software co-designed allocation architecture; we further propose a lightweight per-core hardware cache—novel in PIM contexts—to drastically reduce inter-die memory access overhead. Evaluated on a real PIM platform, our allocator achieves an average 66× improvement in allocation throughput; integrating the hardware cache yields an additional 31% speedup. For dynamic graph updates, throughput increases by 28×. This work bridges a critical gap in PIM systems—dynamic memory management—and provides foundational infrastructure for PIM-native programming models.
📝 Abstract
Dynamic memory allocation is essential in modern programming but remains under-supported in current PIM devices. In this work, we first conduct a design space exploration of PIM memory allocators, examining optimal metadata placement and management strategies. Building on these insights, we propose PIM-malloc, a fast and scalable allocator for real PIM hardware, improving allocation performance by $66 imes$. We further enhance this design with a lightweight, per-PIM core hardware cache for dynamic allocation, achieving an additional $31%$ performance gain. Finally, we demonstrate the effectiveness of PIM-malloc using a dynamic graph update workload, achieving a $28 imes$ throughput increase.