Something's Fishy In The Data Lake: A Critical Re-evaluation of Table Union Search Benchmarks

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing Table Union Search (TUS) benchmarks suffer from severe biases, causing simple baselines—e.g., column-name matching—to outperform state-of-the-art models by 12.7% F1 on average, revealing distorted evaluation and an inability to assess genuine semantic understanding. This work is the first to systematically identify four critical flaws in current TUS benchmarks and proposes three foundational principles for robust evaluation: reproducibility, semantic-driven design, and distributional robustness—shifting the paradigm from metric-centric to capability-centric assessment. Leveraging empirical analysis, statistical hypothesis testing, and controlled ablation studies, we establish a benchmark quality diagnostic framework that quantifies four key dimensions: schema similarity, value-distribution leakage, label bias, and semantic sparsity. Our principled guidelines have been formally adopted as the official evaluation standard by the ACM SIGMOD ’25 Workshop on Data Discovery.

Technology Category

Application Category

📝 Abstract
Recent table representation learning and data discovery methods tackle table union search (TUS) within data lakes, which involves identifying tables that can be unioned with a given query table to enrich its content. These methods are commonly evaluated using benchmarks that aim to assess semantic understanding in real-world TUS tasks. However, our analysis of prominent TUS benchmarks reveals several limitations that allow simple baselines to perform surprisingly well, often outperforming more sophisticated approaches. This suggests that current benchmark scores are heavily influenced by dataset-specific characteristics and fail to effectively isolate the gains from semantic understanding. To address this, we propose essential criteria for future benchmarks to enable a more realistic and reliable evaluation of progress in semantic table union search.
Problem

Research questions and friction points this paper is trying to address.

Evaluating table union search benchmarks critically
Identifying limitations in current TUS benchmark performance
Proposing criteria for realistic semantic table union evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyze limitations in current TUS benchmarks
Propose criteria for realistic benchmark evaluation
Focus on semantic understanding in table union
🔎 Similar Papers
No similar papers found.
A
Allaa Boutaleb
Sorbonne Université, CNRS, LIP6, F-75005 Paris, France
Bernd Amann
Bernd Amann
Professor of Computer Science, Sorbonne Université (UPMC)
DatabasesBig DataData IntegrationData Quality
H
Hubert Naacke
Sorbonne Université, CNRS, LIP6, F-75005 Paris, France