LLM-Based Data Science Agents: A Survey of Capabilities, Challenges, and Future Directions

📅 2025-10-05

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

Existing LLM-powered data science agents lack a systematic, lifecycle-aligned evaluation framework. Method: We propose the first taxonomy aligned with the full data science lifecycle—spanning six stages—and conduct a cross-dimensional analysis of 45 representative agents along five axes: reasoning paradigms, multimodal integration, tool orchestration, explainability, and safety alignment—based on comprehensive literature review and systematic classification. Results: Our analysis reveals that over 90% of current systems lack robust trustworthiness mechanisms and exhibit critical gaps in deployment monitoring, multimodal reasoning, and governance capabilities. Key contributions include (1) establishing an initial benchmark for evaluating data science agents; (2) empirically characterizing fundamental capability boundaries; and (3) identifying three pivotal research directions: trustworthiness, low-latency inference, and transparency enhancement.

Technology Category

Application Category

📝 Abstract

Recent advances in large language models (LLMs) have enabled a new class of AI agents that automate multiple stages of the data science workflow by integrating planning, tool use, and multimodal reasoning across text, code, tables, and visuals. This survey presents the first comprehensive, lifecycle-aligned taxonomy of data science agents, systematically analyzing and mapping forty-five systems onto the six stages of the end-to-end data science process: business understanding and data acquisition, exploratory analysis and visualization, feature engineering, model building and selection, interpretation and explanation, and deployment and monitoring. In addition to lifecycle coverage, we annotate each agent along five cross-cutting design dimensions: reasoning and planning style, modality integration, tool orchestration depth, learning and alignment methods, and trust, safety, and governance mechanisms. Beyond classification, we provide a critical synthesis of agent capabilities, highlight strengths and limitations at each stage, and review emerging benchmarks and evaluation practices. Our analysis identifies three key trends: most systems emphasize exploratory analysis, visualization, and modeling while neglecting business understanding, deployment, and monitoring; multimodal reasoning and tool orchestration remain unresolved challenges; and over 90% lack explicit trust and safety mechanisms. We conclude by outlining open challenges in alignment stability, explainability, governance, and robust evaluation frameworks, and propose future research directions to guide the development of robust, trustworthy, low-latency, transparent, and broadly accessible data science agents.

Problem

Research questions and friction points this paper is trying to address.

Automating data science workflow stages using LLM-based AI agents

Classifying data science agents across lifecycle stages and design dimensions

Addressing limitations in deployment, trust mechanisms, and multimodal reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automates data science workflow with planning and tool use

Integrates multimodal reasoning across text, code, and visuals

Systematically classifies agents across lifecycle stages and design dimensions

🔎 Similar Papers

The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies