Can Agentic AI Match the Performance of Human Data Scientists?

📅 2025-12-24

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This work investigates whether autonomous AI systems can match human data scientists—particularly in scenarios requiring domain knowledge to identify implicit latent variables. Method: We construct synthetic multimodal prediction tasks (e.g., property insurance analytics) driven by image semantics, exposing performance bottlenecks of code-only, general-purpose analytical pipelines when domain knowledge is absent. We introduce the first cross-modal latent variable benchmark and propose a “domain-aware” evaluation paradigm, supported by a synthetic data generation and assessment framework. Contribution/Results: Experiments reveal that current LLM-based AI agents underperform humans significantly on image-semantics-dependent tasks; incorporating domain knowledge into baseline methods improves accuracy by over 35%. The results demonstrate that domain awareness is essential for advancing AI’s capabilities in data science, exposing a fundamental limitation in existing agents’ integration of domain expertise.

Technology Category

Application Category

📝 Abstract

Data science plays a critical role in transforming complex data into actionable insights across numerous domains. Recent developments in large language models (LLMs) have significantly automated data science workflows, but a fundamental question persists: Can these agentic AI systems truly match the performance of human data scientists who routinely leverage domain-specific knowledge? We explore this question by designing a prediction task where a crucial latent variable is hidden in relevant image data instead of tabular features. As a result, agentic AI that generates generic codes for modeling tabular data cannot perform well, while human experts could identify the important hidden variable using domain knowledge. We demonstrate this idea with a synthetic dataset for property insurance. Our experiments show that agentic AI that relies on generic analytics workflow falls short of methods that use domain-specific insights. This highlights a key limitation of the current agentic AI for data science and underscores the need for future research to develop agentic AI systems that can better recognize and incorporate domain knowledge.

Problem

Research questions and friction points this paper is trying to address.

Agentic AI cannot match human data scientists' domain knowledge use

Current AI fails when crucial variables are hidden in non-tabular data

Generic analytics workflows lack domain-specific insights for complex tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic AI uses generic analytics workflow

Human experts apply domain-specific knowledge

Future AI needs to incorporate domain insights

🔎 Similar Papers

DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?