Parallel R-tree-based Spatial Query Processing on a Commercial Processing-in-Memory System

📅 2026-04-15

📈 Citations: 0

✨ Influential: 0

career value

252K/year

🤖 AI Summary

This study addresses the bottlenecks of memory bandwidth limitations and high data transfer overhead in large-scale spatial data queries by presenting the first implementation of R-tree range queries on a commercial Processing-in-Memory (PIM) platform, the UPMEM DPU. The authors propose a broadcast-based execution strategy: the CPU constructs the R-tree bottom-up, broadcasts upper-level nodes to DPUs for global filtering, and distributes lower-level nodes to enable CPU-DPU cooperative parallel batch processing. This approach substantially reduces communication costs while improving scalability and energy efficiency. Experimental results on the Lakes dataset demonstrate that with 2,540 DPUs, the system achieves a 3.66× speedup in kernel execution and a 2.70× end-to-end speedup over a CPU-only baseline, along with a 3.4× reduction in energy consumption (59.6 kJ vs. 167.0 kJ).

Technology Category

Application Category

📝 Abstract

The growing volume of data in scientific domains has made spatial query processing increasingly challenging due to high data transfer costs across the memory hierarchy and limited memory bandwidth. To address these bottlenecks and reduce the energy consumed on data movement, this work explores Processing-in-Memory (PIM) systems by executing range queries directly inside memory chips. Unlike prior PIM studies centered on linear scans or hash-based queries, this work is the first to map R-tree range queries onto commercial PIM hardware. The proposed broadcast-based method constructs the R-tree bottom-up on the CPU, broadcasts top levels to UPMEM DPUs (DRAM Processing Units) for global filtering, and distributes lower levels for parallel batched queries in a CPU-DPU system. We evaluate our approach on two real spatial datasets, Sports (999K rectangles) and Lakes (8.4M rectangles), and assess scalability using a synthetic dataset with up to 16M rectangles and 3.9M queries on a commercial UPMEM PIM system with up to 2,540 DPUs. Across all datasets, broadcast-based execution consistently outperforms subtree partitioning by preventing communication from dominating execution. On the Lakes dataset, strong scaling from 512 to 2,540 DPUs reduces kernel time from 64.9 s to 17.6 s, yielding up to 3.66x kernel and 2.70x end-to-end speedup relative to the CPU R-tree search on the same system. The PIM kernel also consumes approximately 3.4x less energy than the corresponding CPU search (e.g., 59.6 kJ vs. 167.0 kJ on Lakes), demonstrating scalable and energy-efficient hierarchical spatial range queries.

Problem

Research questions and friction points this paper is trying to address.

spatial query processing

memory bandwidth

data movement

Processing-in-Memory

R-tree

Innovation

Methods, ideas, or system contributions that make the work stand out.

Processing-in-Memory

R-tree

Spatial Query