Parallel $k$d-tree with Batch Updates

📅 2024-11-14

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

Existing k-d trees suffer from low parallel efficiency, poor cache locality, and difficulty in dynamic maintenance during construction, batch updates, and queries (e.g., k-nearest neighbor search, range search, and counting) over large-scale multidimensional data. This paper proposes Pkd-tree, an efficient parallel in-memory k-d tree. Our approach addresses these challenges via three key contributions: (i) a novel parallel construction algorithm jointly optimizing work/span and cache complexity; (ii) a batch-update mechanism based on localized subtree reconstruction, ensuring weight-balanced trees and high concurrency; and (iii) a cache-friendly memory layout guided by theoretical analysis. Experimental evaluation demonstrates that Pkd-tree significantly outperforms state-of-the-art methods in construction and batch-update throughput, while achieving comparable or superior query performance across k-NN, range, and counting workloads. The implementation is open-sourced.

Technology Category

Application Category

📝 Abstract

The $k$d-tree is one of the most widely used data structures to manage multi-dimensional data. Due to the ever-growing data volume, it is imperative to consider parallelism in $k$d-trees. However, we observed challenges in existing parallel kd-tree implementations, for both constructions and updates. The goal of this paper is to develop efficient in-memory $k$d-trees by supporting high parallelism and cache-efficiency. We propose the Pkd-tree (Parallel $k$d-tree), a parallel $k$d-tree that is efficient both in theory and in practice. The Pkd-tree supports parallel tree construction, batch update (insertion and deletion), and various queries including k-nearest neighbor search, range query, and range count. We proved that our algorithms have strong theoretical bounds in work (sequential time complexity), span (parallelism), and cache complexity. Our key techniques include 1) an efficient construction algorithm that optimizes work, span, and cache complexity simultaneously, and 2) reconstruction-based update algorithms that guarantee the tree to be weight-balanced. With the new algorithmic insights and careful engineering effort, we achieved a highly optimized implementation of the Pkd-tree. We tested Pkd-tree with various synthetic and real-world datasets, including both uniform and highly skewed data. We compare the Pkd-tree with state-of-the-art parallel $k$d-tree implementations. In all tests, with better or competitive query performance, Pkd-tree is much faster in construction and updates consistently than all baselines. We released our code.

Problem

Research questions and friction points this paper is trying to address.

High-dimensional Data

Efficiency Issues

Kd-tree Optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pkd-tree

Multi-dimensional Data Processing

Efficient Construction and Update

🔎 Similar Papers

Building a Balanced k-d Tree in O(kn log n) Time