DP-Bench: A Benchmark for Evaluating Data Product Creation Systems

📅 2025-12-16

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

A standardized evaluation benchmark for automated data product generation—distinct from raw data—is currently lacking. Method: This paper introduces DP-Bench, the first end-to-end benchmark for evaluating generated data products (not raw data), establishing a standardized evaluation framework for data product generation. It unifies ELT pipeline modeling with the Text-to-SQL paradigm to define a multi-dimensional assessment protocol covering correctness, utility, and maintainability. Furthermore, it proposes an LLM-driven joint evaluation workflow integrating program synthesis and SQL execution validation. Contribution/Results: We publicly release the DP-Bench benchmark dataset and accompanying evaluation toolkit, enabling reproducible, quantitative comparison across diverse LLM-based baselines. DP-Bench significantly enhances consistency and comparability in cross-system evaluations of data product generation capabilities.

Technology Category

Application Category

📝 Abstract

A data product is created with the intention of solving a specific problem, addressing a specific business usecase or meeting a particular need, going beyond just serving data as a raw asset. Data products enable end users to gain greater insights about their data. Since it was first introduced over a decade ago, there has been considerable work, especially in industry, to create data products manually or semi-automatically. However, there exists hardly any benchmark to evaluate automatic data product creation. In this work, we present a benchmark, first of its kind, for this task. We call it DP-Bench. We describe how this benchmark was created by taking advantage of existing work in ELT (Extract-Load-Transform) and Text-to-SQL benchmarks. We also propose a number of LLM based approaches that can be considered as baselines for generating data products automatically. We make the DP-Bench and supplementary materials available in https://huggingface.co/datasets/ibm-research/dp-bench .

Problem

Research questions and friction points this paper is trying to address.

Lack of benchmarks for automatic data product creation

Need to evaluate systems generating data products automatically

Proposing DP-Bench as a first benchmark for this task

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces DP-Bench as a benchmark for evaluating data product creation systems

Leverages existing ELT and Text-to-SQL benchmarks for benchmark construction

Proposes LLM-based baseline approaches for automatic data product generation

🔎 Similar Papers

A Multi-Faceted Evaluation Framework for Assessing Synthetic Data Generated by Large Language Models