Parallel Joinable B-Trees in the Fork-Join I/O Model

📅 2025-10-22

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

Existing parallel tree-set operations (union, intersection, difference) under the join framework suffer from low I/O efficiency and lack rigorous theoretical I/O complexity bounds. Method: We propose the first Fork-Join I/O model tailored to fork-join parallelism, establishing the first rigorous I/O cost analysis framework for join-based parallel set operations on B-trees. By co-designing B-tree structure and join-based algorithms, we jointly optimize I/O work and I/O span. Contribution/Results: Our approach achieves optimal I/O work (O(m log_B(n/m))) and I/O span (O(log_B m cdot log_2 log_B n + log_B n)), where (n) and (m) denote input sizes and (B) is the block size. This work fills a fundamental gap in the I/O complexity theory of parallel tree-set operations and introduces a new paradigm—backed by provable performance guarantees—for efficient dynamic maintenance of large-scale ordered data.

Technology Category

Application Category

📝 Abstract

Balanced search trees are widely used in computer science to efficiently maintain dynamic ordered data. To support efficient set operations (e.g., union, intersection, difference) using trees, the join-based framework is widely studied. This framework has received particular attention in the parallel setting, and has been shown to be effective in enabling simple and theoretically efficient set operations on trees. Despite the widespread adoption of parallel join-based trees, a major drawback of previous work on such data structures is the inefficiency of their input/output (I/O) access patterns. Some recent work (e.g., C-trees and PaC-trees) focused on more I/O-friendly implementations of these algorithms. Surprisingly, however, there have been no results on bounding the I/O-costs for these algorithms. It remains open whether these algorithms can provide tight, provable guarantees in I/O-costs on trees. This paper studies efficient parallel algorithms for set operations based on search tree algorithms using a join-based framework, with a special focus on achieving I/O efficiency in these algorithms. To better capture the I/O-efficiency in these algorithms in parallel, we introduce a new computational model, Fork-Join I/O Model, to measure the I/O costs in fork-join parallelism. This model measures the total block transfers (I/O work) and their critical path (I/O span). Under this model, we propose our new solution based on B-trees. Our parallel algorithm computes the union, intersection, and difference of two B-trees with $O(m log_B(n/m))$ I/O work and $O(log_B m cdot log_2 log_B n + log_B n)$ I/O span, where $n$ and $m leq n$ are the sizes of the two trees, and $B$ is the block size.

Problem

Research questions and friction points this paper is trying to address.

Achieving I/O efficiency in parallel join-based tree algorithms

Establishing provable I/O cost bounds for set operations

Developing a computational model for parallel I/O analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Fork-Join I/O Model for I/O cost measurement

Proposes parallel B-tree algorithms for set operations

Achieves efficient I/O work and span guarantees

🔎 Similar Papers

Building a Balanced k-d Tree in O(kn log n) Time