NeurBench: Benchmarking Learned Database Components with Data and Workload Drift Modeling

📅 2025-03-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing learned database component benchmarks inadequately capture the diversity of data and workload drift, leading to insufficient evaluation of dynamic adaptability. This paper introduces NeurBench—the first benchmark suite supporting quantifiable, controllable, multi-type drift. Its core innovations are: (1) a unified “drift factor” that formally characterizes concept drift, distribution drift, and workload rhythm drift; (2) a drift-aware data and workload generation framework that enables precise, synthetic drift modeling while preserving real-world relevance; and (3) systematic robustness evaluation of query optimizers, indexes, and concurrency control mechanisms under identical experimental conditions. Experiments—first of their kind—reveal distinct response mechanisms and performance bottlenecks of state-of-the-art learned components across drift types, providing critical empirical foundations for designing adaptive database systems.

Technology Category

Application Category

📝 Abstract
Learned database components, which deeply integrate machine learning into their design, have been extensively studied in recent years. Given the dynamism of databases, where data and workloads continuously drift, it is crucial for learned database components to remain effective and efficient in the face of data and workload drift. Adaptability, therefore, is a key factor in assessing their practical applicability. However, existing benchmarks for learned database components either overlook or oversimplify the treatment of data and workload drift, failing to evaluate learned database components across a broad range of drift scenarios. This paper presents NeurBench, a new benchmark suite that applies measurable and controllable data and workload drift to enable systematic performance evaluations of learned database components. We quantify diverse types of drift by introducing a key concept called the drift factor. Building on this formulation, we propose a drift-aware data and workload generation framework that effectively simulates real-world drift while preserving inherent correlations. We employ NeurBench to evaluate state-of-the-art learned query optimizers, learned indexes, and learned concurrency control within a consistent experimental process, providing insights into their performance under diverse data and workload drift scenarios.
Problem

Research questions and friction points this paper is trying to address.

Evaluating adaptability of learned database components to data and workload drift.
Developing a benchmark suite for systematic performance evaluation under drift.
Simulating real-world drift scenarios to test learned database components.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces NeurBench for systematic performance evaluations
Quantifies drift using a novel drift factor concept
Simulates real-world drift with correlation-preserving framework
🔎 Similar Papers
No similar papers found.
Z
Zhanhao Zhao
National University of Singapore
G
Gang Chen
Zhejiang University
Haotian Gao
Haotian Gao
The University of Tokyo
Spatiotemporal forecastingSelf-supervised learningTime series forecasting
Manuel Rigger
Manuel Rigger
National University of Singapore
Software EngineeringSystemsDatabasesProgramming Languages
B
Beng Chin Ooi
National University of Singapore
N
Naili Xing
National University of Singapore
Lingze Zeng
Lingze Zeng
National University of Singapore
DB
M
Meihui Zhang
Beijing Institute of Technology