Humans, Machine Learning, and Language Models in Union: A Cognitive Study on Table Unionability

📅 2025-06-15
📈 Citations: 0
Influential: 0
📄 PDF

career value

200K/year
🤖 AI Summary
This study investigates human cognitive behavior in table unionability judgment—the task of determining whether two tables can be meaningfully merged—and explores human–AI collaboration for optimization. Method: Through controlled cognitive experiments and behavioral data analysis, we systematically characterize human judgment biases and consistency patterns; develop a supervised machine learning framework to augment human judgments; and conduct comparative evaluations between humans and large language models (LLMs, e.g., GPT series), assessing both standalone and hybrid performance. Results: We establish foundational cognitive principles governing unionability judgment; propose a Human-in-the-Loop enhancement framework that improves raw human accuracy by a statistically significant margin; and demonstrate that while LLMs outperform humans individually, human–LLM fusion achieves superior overall accuracy. Collectively, this work lays the groundwork for a new paradigm in human–AI collaborative data discovery.

Technology Category

Application Category

📝 Abstract
Data discovery and table unionability in particular became key tasks in modern Data Science. However, the human perspective for these tasks is still under-explored. Thus, this research investigates the human behavior in determining table unionability within data discovery. We have designed an experimental survey and conducted a comprehensive analysis, in which we assess human decision-making for table unionability. We use the observations from the analysis to develop a machine learning framework to boost the (raw) performance of humans. Furthermore, we perform a preliminary study on how LLM performance is compared to humans indicating that it is typically better to consider a combination of both. We believe that this work lays the foundations for developing future Human-in-the-Loop systems for efficient data discovery.
Problem

Research questions and friction points this paper is trying to address.

Investigates human behavior in table unionability decisions
Develops ML framework to enhance human performance
Compares LLM and human performance in data discovery
Innovation

Methods, ideas, or system contributions that make the work stand out.

Machine learning framework enhances human performance
Combines human and LLM for better results
Experimental survey analyzes human decision-making