Membrane: Accelerating Database Analytics with Bank-Level DRAM-PIM Filtering

📅 2025-04-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the data movement bottleneck between CPU and memory in in-memory database analytics, this paper proposes a CPU-PIM collaborative query processing framework that transforms JOIN operations into fine-grained, PIM-friendly filtering tasks directly executable in DRAM. Our key contributions are: (1) the first bank-level DRAM-PIM mapping mechanism enabling fine-grained filtering; (2) a principled CPU-PIM division-of-labor paradigm that preserves system compatibility while balancing parallelism and flexibility; and (3) synergistic optimizations including pre-join denormalization and selective aggregation offloading. End-to-end evaluation on TPC-H and SSB benchmarks shows that our approach achieves 5.92×–6.5× speedup over conventional CPU-only execution, outperforms full denormalization by 3.03×–4.05×, and incurs only 9%–17% additional memory overhead.

Technology Category

Application Category

📝 Abstract
In-memory database query processing frequently involves substantial data transfers between the CPU and memory, leading to inefficiencies due to Von Neumann bottleneck. Processing-in-Memory (PIM) architectures offer a viable solution to alleviate this bottleneck. In our study, we employ a commonly used software approach that streamlines JOIN operations into simpler selection or filtering tasks using pre-join denormalization which makes query processing workload more amenable to PIM acceleration. This research explores DRAM design landscape to evaluate how effectively these filtering tasks can be efficiently executed across DRAM hierarchy and their effect on overall application speedup. We also find that operations such as aggregates are more suitably executed on the CPU rather than PIM. Thus, we propose a cooperative query processing framework that capitalizes on both CPU and PIM strengths, where (i) the DRAM-based PIM block, with its massive parallelism, supports scan operations while (ii) CPU, with its flexible architecture, supports the rest of query execution. This allows us to utilize both PIM and CPU where appropriate and prevent dramatic changes to the overall system architecture. With these minimal modifications, our methodology enables us to faithfully perform end-to-end performance evaluations using established analytics benchmarks such as TPCH and star-schema benchmark (SSB). Our findings show that this novel mapping approach improves performance, delivering a 5.92x/6.5x speedup compared to a traditional schema and 3.03-4.05x speedup compared to a denormalized schema with 9-17% memory overhead, depending on the degree of partial denormalization. Further, we provide insights into query selectivity, memory overheads, and software optimizations in the context of PIM-based filtering, which better explain the behavior and performance of these systems across the benchmarks.
Problem

Research questions and friction points this paper is trying to address.

Reducing CPU-memory data transfer inefficiencies in databases
Optimizing JOIN operations via PIM-friendly filtering tasks
Balancing workload between CPU and PIM for query processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses DRAM-PIM for efficient data filtering
Combines CPU and PIM for query processing
Employs pre-join denormalization for PIM acceleration
🔎 Similar Papers
No similar papers found.
A
Akhil Shekar
University of Virginia
K
Kevin Gaffney
Microsoft
M
Martin Prammer
Carnegie Mellon University
K
Khyati Kiyawat
University of Virginia
L
Lingxi Wu
University of Virginia
H
Helena Caminal
Cornell University
Zhenxing Fan
Zhenxing Fan
University of Virginia
Computer Architecture
Yimin Gao
Yimin Gao
University of Virginia
AI hardwareProcessing-in-Memoryhardware/software codesignVLSIhardware security
A
A. Venkat
University of Virginia
J
Jos'e F. Mart'inez
Cornell University
J
Jignesh Patel
Carnegie Mellon University
Kevin Skadron
Kevin Skadron
Harry Douglas Forsyth Professor of Computer Science, University of Virginia
computer architectureprocessing in memoryhardware accelerationautomata processingheterogeneous computing