π€ AI Summary
Existing learned augmented binary search trees (BSTs) are restricted to Zipfian access distributions and lack robustness to prediction errors, dynamic updates, and general access patterns. To address these limitations, we propose Pred-Treap, a prediction-augmented Treap whose node priorities are set as $-lfloor log log(1/w_x)
floor + U(0,1)$, where $w_x$ is the predicted access weight of key $x$. This composite priority ensures that node depth is primarily governed by predicted weights while preserving randomness for structural stability. Pred-Treap is the first learned BST variant provably optimal under arbitrary access distributions, simultaneously guaranteeing static optimality, the working-set property, online dynamic updates, and robustness to prediction inaccuracies. It naturally generalizes to B-Treaps for external memory. We provide theoretical bounds proving its static optimality and working-set guarantee. Empirical evaluation demonstrates that Pred-Treap consistently outperforms classical BSTs and the ICMLβ22 baseline across diverse synthetic and real-world access distributions.
π Abstract
We study learning-augmented binary search trees (BSTs) via Treaps with carefully designed priorities. The result is a simple search tree in which the depth of each item $x$ is determined by its predicted weight $w_x$. Specifically, each item $x$ is assigned a composite priority of $-lfloorloglog(1/w_x)
floor + U(0, 1)$ where $U(0, 1)$ is the uniform random variable. By choosing $w_x$ as the relative frequency of $x$, the resulting search trees achieve static optimality. This approach generalizes the recent learning-augmented BSTs [Lin-Luo-Woodruff ICML '22], which only work for Zipfian distributions, by extending them to arbitrary input distributions. Furthermore, we demonstrate that our method can be generalized to a B-Tree data structure using the B-Treap approach [Golovin ICALP '09]. Our search trees are also capable of leveraging localities in the access sequence through online self-reorganization, thereby achieving the working-set property. Additionally, they are robust to prediction errors and support dynamic operations, such as insertions, deletions, and prediction updates. We complement our analysis with an empirical study, demonstrating that our method outperforms prior work and classic data structures.