🤖 AI Summary
Existing benchmarks focus narrowly on single-table, multi-focus factual question answering and lack evaluation frameworks for data product discovery in realistic analytical scenarios. Method: We introduce DPBench—the first user-request-driven benchmark for data product discovery—supporting joint retrieval across heterogeneous assets (tables and unstructured text). We propose a novel “data product–level” evaluation paradigm and develop a table-text co-discovery method grounded in semantic clustering and multi-LLM consensus verification to ensure full traceability and executable request fulfillment. Contribution/Results: DPBench comprises 1.2K+ expert-crafted data product requests (DPRs), covering multi-source, multi-step, and auditable requirements. Empirical analysis demonstrates the feasibility—and exposes key bottlenecks—of hybrid retrieval (dense + sparse) for this task, establishing the first standardized evaluation foundation for automated data product engineering.
📝 Abstract
Data products are reusable, self-contained assets designed for specific business use cases. Automating their discovery and generation is of great industry interest, as it enables discovery in large data lakes and supports analytical Data Product Requests (DPRs). Currently, there is no benchmark established specifically for data product discovery. Existing datasets focus on answering single factoid questions over individual tables rather than collecting multiple data assets for broader, coherent products. To address this gap, we introduce DPBench, the first user-request-driven data product benchmark over hybrid table-text corpora. Our framework systematically repurposes existing table-text QA datasets by clustering related tables and passages into coherent data products, generating professional-level analytical requests that span both data sources, and validating benchmark quality through multi-LLM evaluation. DPBench preserves full provenance while producing actionable, analyst-like data product requests. Baseline experiments with hybrid retrieval methods establish the feasibility of DPR evaluation, reveal current limitations, and point to new opportunities for automatic data product discovery research. Code and datasets are available at: https://anonymous.4open.science/r/data-product-benchmark-BBA7/