NeurBench: Benchmarking Learned Database Components with Data and Workload Drift Modeling

📅 2025-03-18

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Existing learned database component benchmarks inadequately capture the diversity of data and workload drift, leading to insufficient evaluation of dynamic adaptability. This paper introduces NeurBench—the first benchmark suite supporting quantifiable, controllable, multi-type drift. Its core innovations are: (1) a unified “drift factor” that formally characterizes concept drift, distribution drift, and workload rhythm drift; (2) a drift-aware data and workload generation framework that enables precise, synthetic drift modeling while preserving real-world relevance; and (3) systematic robustness evaluation of query optimizers, indexes, and concurrency control mechanisms under identical experimental conditions. Experiments—first of their kind—reveal distinct response mechanisms and performance bottlenecks of state-of-the-art learned components across drift types, providing critical empirical foundations for designing adaptive database systems.

Technology Category

Application Category

📝 Abstract

Learned database components, which deeply integrate machine learning into their design, have been extensively studied in recent years. Given the dynamism of databases, where data and workloads continuously drift, it is crucial for learned database components to remain effective and efficient in the face of data and workload drift. Adaptability, therefore, is a key factor in assessing their practical applicability. However, existing benchmarks for learned database components either overlook or oversimplify the treatment of data and workload drift, failing to evaluate learned database components across a broad range of drift scenarios. This paper presents NeurBench, a new benchmark suite that applies measurable and controllable data and workload drift to enable systematic performance evaluations of learned database components. We quantify diverse types of drift by introducing a key concept called the drift factor. Building on this formulation, we propose a drift-aware data and workload generation framework that effectively simulates real-world drift while preserving inherent correlations. We employ NeurBench to evaluate state-of-the-art learned query optimizers, learned indexes, and learned concurrency control within a consistent experimental process, providing insights into their performance under diverse data and workload drift scenarios.

Problem

Research questions and friction points this paper is trying to address.

Evaluating adaptability of learned database components to data and workload drift.

Developing a benchmark suite for systematic performance evaluation under drift.

Simulating real-world drift scenarios to test learned database components.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces NeurBench for systematic performance evaluations

Quantifies drift using a novel drift factor concept

Simulates real-world drift with correlation-preserving framework

🔎 Similar Papers

NeurDB: On the Design and Implementation of an AI-powered Autonomous Database