🤖 AI Summary
This study investigates human cognitive behavior in table unionability judgment—the task of determining whether two tables can be meaningfully merged—and explores human–AI collaboration for optimization. Method: Through controlled cognitive experiments and behavioral data analysis, we systematically characterize human judgment biases and consistency patterns; develop a supervised machine learning framework to augment human judgments; and conduct comparative evaluations between humans and large language models (LLMs, e.g., GPT series), assessing both standalone and hybrid performance. Results: We establish foundational cognitive principles governing unionability judgment; propose a Human-in-the-Loop enhancement framework that improves raw human accuracy by a statistically significant margin; and demonstrate that while LLMs outperform humans individually, human–LLM fusion achieves superior overall accuracy. Collectively, this work lays the groundwork for a new paradigm in human–AI collaborative data discovery.
📝 Abstract
Data discovery and table unionability in particular became key tasks in modern Data Science. However, the human perspective for these tasks is still under-explored. Thus, this research investigates the human behavior in determining table unionability within data discovery. We have designed an experimental survey and conducted a comprehensive analysis, in which we assess human decision-making for table unionability. We use the observations from the analysis to develop a machine learning framework to boost the (raw) performance of humans. Furthermore, we perform a preliminary study on how LLM performance is compared to humans indicating that it is typically better to consider a combination of both. We believe that this work lays the foundations for developing future Human-in-the-Loop systems for efficient data discovery.