🤖 AI Summary
This work investigates whether autonomous AI systems can match human data scientists—particularly in scenarios requiring domain knowledge to identify implicit latent variables. Method: We construct synthetic multimodal prediction tasks (e.g., property insurance analytics) driven by image semantics, exposing performance bottlenecks of code-only, general-purpose analytical pipelines when domain knowledge is absent. We introduce the first cross-modal latent variable benchmark and propose a “domain-aware” evaluation paradigm, supported by a synthetic data generation and assessment framework. Contribution/Results: Experiments reveal that current LLM-based AI agents underperform humans significantly on image-semantics-dependent tasks; incorporating domain knowledge into baseline methods improves accuracy by over 35%. The results demonstrate that domain awareness is essential for advancing AI’s capabilities in data science, exposing a fundamental limitation in existing agents’ integration of domain expertise.
📝 Abstract
Data science plays a critical role in transforming complex data into actionable insights across numerous domains. Recent developments in large language models (LLMs) have significantly automated data science workflows, but a fundamental question persists: Can these agentic AI systems truly match the performance of human data scientists who routinely leverage domain-specific knowledge? We explore this question by designing a prediction task where a crucial latent variable is hidden in relevant image data instead of tabular features. As a result, agentic AI that generates generic codes for modeling tabular data cannot perform well, while human experts could identify the important hidden variable using domain knowledge. We demonstrate this idea with a synthetic dataset for property insurance. Our experiments show that agentic AI that relies on generic analytics workflow falls short of methods that use domain-specific insights. This highlights a key limitation of the current agentic AI for data science and underscores the need for future research to develop agentic AI systems that can better recognize and incorporate domain knowledge.