PUSHtap: PIM-based In-Memory HTAP with Unified Data Storage Format

📅 2025-08-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
HTAP systems face a fundamental tension between row-oriented storage (optimal for OLTP) and column-oriented storage (optimal for OLAP), making it difficult to simultaneously achieve performance isolation, data freshness, and workload-specific optimization. This paper proposes a Processing-in-Memory (PIM)-based unified architecture: it introduces a novel two-dimensional row-column aligned storage format supporting MVCC-based concurrency control, and extends commercial PIM hardware to enable CPU-driven transactional processing alongside PIM-localized columnar OLAP computation. The key innovation lies in tightly coupling PIM’s near-data columnar access capability with the CPU’s row-oriented transactional execution, enabling workload partitioning and real-time consistency over a single dataset. Experimental results demonstrate that, compared to multi-instance PIM approaches, our design improves OLAP and OLTP throughput by 3.4× and 4.4×, respectively—marking the first HTAP system to concurrently satisfy all three core design objectives.

Technology Category

Application Category

📝 Abstract
Hybrid transaction/analytical processing (HTAP) is an emerging database paradigm that supports both online transaction processing (OLTP) and online analytical processing (OLAP) workloads. Computing-intensive OLTP operations, involving row-wise data manipulation, are suitable for row-store format. In contrast, memory-intensive OLAP operations, which are column-centric, benefit from column-store format. This emph{data-format dilemma} prevents HTAP systems from concurrently achieving three design goals: performance isolation, data freshness, and workload-specific optimization. Another background technology is Processing-in-Memory (PIM), which integrates computing units (PIM units) inside DRAM memory devices to accelerate memory-intensive workloads, including OLAP. Our key insight is to combine the interleaved CPU access and localized PIM unit access to provide two-dimensional access to address the data format contradictions inherent in HTAP. First, we propose a unified data storage format with novel data alignment and placement techniques to optimize the effective bandwidth of CPUs and PIM units and exploit the PIM's parallelism. Second, we implement the multi-version concurrency control (MVCC) essential for single-instance HTAP. Third, we extend the commercial PIM architecture to support the OLAP operations and concurrent access from PIM and CPU. Experiments show that PUSHtap can achieve 3.4 exttimes{}/4.4 exttimes{} OLAP/OLTP throughput improvement compared to multi-instance PIM-based design.
Problem

Research questions and friction points this paper is trying to address.

Resolve HTAP's data-format dilemma for OLTP and OLAP
Optimize CPU and PIM unit bandwidth with unified storage
Enable concurrent HTAP access via MVCC and PIM extension
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified data storage format for HTAP
Multi-version concurrency control (MVCC)
Extended PIM architecture for OLAP
🔎 Similar Papers
No similar papers found.
Yilong Zhao
Yilong Zhao
Ph.D. student, UC Berkeley
Computer SystemMicroarchitectureMachine Learning System
Mingyu Gao
Mingyu Gao
Tsinghua University
Computer ArchitectureMemory SystemsHardware SecurityDomain-Specific Acceleration
Huanchen Zhang
Huanchen Zhang
Assistant Professor, Tsinghua University
Database SystemsData Structures
Fangxin Liu
Fangxin Liu
Shanghai Jiao Tong University
In-memory Computing、Brian-inspired Neuromorphic Computing
G
Gongye Chen
Shanghai Jiao Tong University, Shanghai, China
H
He Xian
Shanghai Qi Zhi Institute, Shanghai, China
H
Haibing Guan
Shanghai Jiao Tong University, Shanghai, China
L
Li Jiang
Shanghai Jiao Tong University, Shanghai, China; Shanghai Qi Zhi Institute, Shanghai, China