JSPIM: A Skew-Aware PIM Accelerator for High-Performance Databases Join and Select Operations

📅 2025-08-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Database analytical workloads suffer from DRAM bandwidth and latency bottlenecks in join and select operations; existing PIM architectures reuse CPU-based algorithms, resulting in low parallelism, high off-chip communication overhead, and poor resilience to data skew. To address these challenges, this paper proposes a data-skew-aware PIM acceleration architecture. Leveraging algorithm-hardware co-design, it innovatively integrates subarray-level parallelism with rank-level processing, restructures hash tables for O(1) lookups, and performs fine-grained in-memory load balancing and skew mitigation—substantially reducing redundant off-chip data movement. Evaluated on the SSB benchmark, the architecture achieves a 2.5× improvement in end-to-end throughput and 1.1–28× speedup per query, with only 7% additional data storage overhead and a 2.1% increase in chip area.

Technology Category

Application Category

📝 Abstract
Database applications are increasingly bottlenecked by memory bandwidth and latency due to the memory wall and the limited scalability of DRAM. Join queries, central to analytical workloads, require intensive memory access and are particularly vulnerable to inefficiencies in data movement. While Processing-in-Memory (PIM) offers a promising solution, existing designs typically reuse CPU-oriented join algorithms, limiting parallelism and incurring costly inter-chip communication. Additionally, data skew, a main challenge in CPU-based joins, remains unresolved in current PIM architectures. We introduce JSPIM, a PIM module that accelerates hash join and, by extension, corresponding select queries through algorithm-hardware co-design. JSPIM deploys parallel search engines within each subarray and redesigns hash tables to achieve O(1) lookups, fully exploiting PIM's fine-grained parallelism. To mitigate skew, our design integrates subarray-level parallelism with rank-level processing, eliminating redundant off-chip transfers. Evaluations show JSPIM delivers 400x to 1000x speedup on join queries versus DuckDB. When paired with DuckDB for the full SSB benchmark, JSPIM achieves an overall 2.5x throughput improvement (individual query gains of 1.1x to 28x), at just a 7% data overhead and 2.1% per-rank PIM-enabled chip area increase.
Problem

Research questions and friction points this paper is trying to address.

Addresses memory bandwidth and latency bottlenecks in database applications
Improves join query performance using PIM with algorithm-hardware co-design
Mitigates data skew in PIM architectures for efficient parallel processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

PIM module with parallel search engines
Redesigned hash tables for O(1) lookups
Subarray-level parallelism mitigates data skew
🔎 Similar Papers
No similar papers found.
S
Sabiha Tajdari
Computer Science Department, University of Virginia
A
Anastasia Ailamaki
School of Computer and Communication Sciences, École Polytechnique Fédérale de Lausanne
Sandhya Dwarkadas
Sandhya Dwarkadas
Walter N. Munster Professor, Computer Science, University of Virginia