DP-Bench: A Benchmark for Evaluating Data Product Creation Systems

📅 2025-12-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
A standardized evaluation benchmark for automated data product generation—distinct from raw data—is currently lacking. Method: This paper introduces DP-Bench, the first end-to-end benchmark for evaluating generated data products (not raw data), establishing a standardized evaluation framework for data product generation. It unifies ELT pipeline modeling with the Text-to-SQL paradigm to define a multi-dimensional assessment protocol covering correctness, utility, and maintainability. Furthermore, it proposes an LLM-driven joint evaluation workflow integrating program synthesis and SQL execution validation. Contribution/Results: We publicly release the DP-Bench benchmark dataset and accompanying evaluation toolkit, enabling reproducible, quantitative comparison across diverse LLM-based baselines. DP-Bench significantly enhances consistency and comparability in cross-system evaluations of data product generation capabilities.

Technology Category

Application Category

📝 Abstract
A data product is created with the intention of solving a specific problem, addressing a specific business usecase or meeting a particular need, going beyond just serving data as a raw asset. Data products enable end users to gain greater insights about their data. Since it was first introduced over a decade ago, there has been considerable work, especially in industry, to create data products manually or semi-automatically. However, there exists hardly any benchmark to evaluate automatic data product creation. In this work, we present a benchmark, first of its kind, for this task. We call it DP-Bench. We describe how this benchmark was created by taking advantage of existing work in ELT (Extract-Load-Transform) and Text-to-SQL benchmarks. We also propose a number of LLM based approaches that can be considered as baselines for generating data products automatically. We make the DP-Bench and supplementary materials available in https://huggingface.co/datasets/ibm-research/dp-bench .
Problem

Research questions and friction points this paper is trying to address.

Lack of benchmarks for automatic data product creation
Need to evaluate systems generating data products automatically
Proposing DP-Bench as a first benchmark for this task
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces DP-Bench as a benchmark for evaluating data product creation systems
Leverages existing ELT and Text-to-SQL benchmarks for benchmark construction
Proposes LLM-based baseline approaches for automatic data product generation
F
Faisal Chowdhury
IBM Research, Yorktown Heights, USA
S
Sola Shirai
IBM Research, Yorktown Heights, USA
S
Sarthak Dash
IBM Research, New York, USA
Nandana Mihindukulasooriya
Nandana Mihindukulasooriya
IBM Research AI, SMIEEE
Semantic WebLinked DataKnowledge GraphsNLPKBQA
Horst Samulowitz
Horst Samulowitz
IBM Research
Artificial IntelligenceAI for AICombinatorial OptimizationMeta-LearningAutomated Machine Learning