🤖 AI Summary
Existing ε-bounded piecewise linear approximation (ε-PLA) algorithms for learned indexes suffer from weak theoretical analysis, unsystematic empirical evaluation, and unclear trade-offs among accuracy, model size, and query performance.
Method: We propose a novel ε-PLA fitting algorithm and, for the first time, derive a tight lower bound Ω(κ·ε²) on its expected segment coverage. We conduct rigorous theoretical complexity analysis and large-scale benchmarking across diverse learned index structures (e.g., ALEX, LISA).
Results: Our analysis systematically uncovers the fundamental triadic trade-off among error bound, model compactness, and query throughput. Experiments show that, under strict ε-error constraints, our algorithm reduces model size by 23% and improves query throughput by 18% on average over state-of-the-art methods. This work provides both a new provable design tool and practical optimization guidelines for learned indexes.
📝 Abstract
A growing trend in the database and system communities is to augment conventional index structures, such as B+-trees, with machine learning (ML) models. Among these, error-bounded Piecewise Linear Approximation ($ε$-PLA) has emerged as a popular choice due to its simplicity and effectiveness. Despite its central role in many learned indexes, the design and analysis of $ε$-PLA fitting algorithms remain underexplored. In this paper, we revisit $ε$-PLA from both theoretical and empirical perspectives, with a focus on its application in learned index structures. We first establish a fundamentally improved lower bound of $Ω(κcdot ε^2)$ on the expected segment coverage for existing $ε$-PLA fitting algorithms, where $κ$ is a data-dependent constant. We then present a comprehensive benchmark of state-of-the-art $ε$-PLA algorithms when used in different learned data structures. Our results highlight key trade-offs among model accuracy, model size, and query performance, providing actionable guidelines for the principled design of future learned data structures.