🤖 AI Summary
This work addresses the performance bottleneck in Processing-in-Memory (PIM) architectures caused by redundant, coarse-grained data transfers between the host and DPUs. To mitigate this inefficiency, the authors propose PIM-CACHE, a lightweight, content-aware data staging layer that introduces content-aware mechanisms into PIM data movement for the first time. By dynamically identifying workload similarity at runtime, PIM-CACHE eliminates redundant data copies and optimizes cache management. Implemented on the UPMEM PIM platform and evaluated using both synthetic and real-world genomic datasets, the approach significantly reduces data transfer overhead and enhances overall system efficiency.
📝 Abstract
Processing-in-memory (PIM) architectures bring computation closer to data, reducing the processor-memory transfer bottleneck in traditional processor-centric designs. Novel hardware solutions, such as UPMEM's in-memory processing technology, achieve this by integrating low-power DRAM processing units (DPUs) into memory DIMMs, enabling massive parallelism and improved memory bandwidth. However, paradoxically, these PIM architectures introduce mandatory coarse-grained data transfers between host DRAM and DPUs, which often become the new bottleneck. We present PIM-CACHE, a lightweight data staging layer that dynamically eliminates redundant data transfers to PIM DPUs by exploiting workload similarity, achieving content-aware copy (CAC). We evaluate PIM-CACHE on both synthetic workloads and real-world genome datasets, demonstrating its effectiveness in reducing PIM data transfer overhead.