Can Agentic AI Match the Performance of Human Data Scientists?

📅 2025-12-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates whether autonomous AI systems can match human data scientists—particularly in scenarios requiring domain knowledge to identify implicit latent variables. Method: We construct synthetic multimodal prediction tasks (e.g., property insurance analytics) driven by image semantics, exposing performance bottlenecks of code-only, general-purpose analytical pipelines when domain knowledge is absent. We introduce the first cross-modal latent variable benchmark and propose a “domain-aware” evaluation paradigm, supported by a synthetic data generation and assessment framework. Contribution/Results: Experiments reveal that current LLM-based AI agents underperform humans significantly on image-semantics-dependent tasks; incorporating domain knowledge into baseline methods improves accuracy by over 35%. The results demonstrate that domain awareness is essential for advancing AI’s capabilities in data science, exposing a fundamental limitation in existing agents’ integration of domain expertise.

Technology Category

Application Category

📝 Abstract
Data science plays a critical role in transforming complex data into actionable insights across numerous domains. Recent developments in large language models (LLMs) have significantly automated data science workflows, but a fundamental question persists: Can these agentic AI systems truly match the performance of human data scientists who routinely leverage domain-specific knowledge? We explore this question by designing a prediction task where a crucial latent variable is hidden in relevant image data instead of tabular features. As a result, agentic AI that generates generic codes for modeling tabular data cannot perform well, while human experts could identify the important hidden variable using domain knowledge. We demonstrate this idea with a synthetic dataset for property insurance. Our experiments show that agentic AI that relies on generic analytics workflow falls short of methods that use domain-specific insights. This highlights a key limitation of the current agentic AI for data science and underscores the need for future research to develop agentic AI systems that can better recognize and incorporate domain knowledge.
Problem

Research questions and friction points this paper is trying to address.

Agentic AI cannot match human data scientists' domain knowledge use
Current AI fails when crucial variables are hidden in non-tabular data
Generic analytics workflows lack domain-specific insights for complex tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic AI uses generic analytics workflow
Human experts apply domain-specific knowledge
Future AI needs to incorporate domain insights
🔎 Similar Papers
No similar papers found.
A
An Luo
School of Statistics, University of Minnesota, Minneapolis, MN, USA
J
Jin Du
School of Statistics, University of Minnesota, Minneapolis, MN, USA
F
Fangqiao Tian
School of Statistics, University of Minnesota, Minneapolis, MN, USA
X
Xun Xian
Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN, USA
R
Robert Specht
School of Statistics, University of Minnesota, Minneapolis, MN, USA
G
Ganghua Wang
Data Science Institute, University of Chicago, Chicago, IL, USA
Xuan Bi
Xuan Bi
Associate Professor, University of Minnesota
StatisticsMachine LearningRecommender SystemsPersonalizationData Privacy
C
Charles Fleming
Cisco Research, San Jose, CA, USA
Jayanth Srinivasa
Jayanth Srinivasa
Cisco Research
Machine LearningNatural Language UnderstandingFederated Learning
Ashish Kundu
Ashish Kundu
Head of Cybersecurity Research, Cisco Research
SecurityPrivacy & Compliance
Mingyi Hong
Mingyi Hong
Associate Professor, University of Minnesota; Amazon AGI
Machine LearningOptimizationGenerative AISignal processing
Jie Ding
Jie Ding
Associate Professor, University of Minnesota Twin Cities
machine learningstatisticssignal processingdeep learning