AutoMind: Adaptive Knowledgeable Agent for Automated Data Science

📅 2025-06-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current LLM-driven data science agents exhibit limited performance in automating complex, innovative machine learning tasks due to rigid pipelines and insufficient modeling of domain expertise. This paper introduces ADSA (Adaptive Data Science Agent), a novel framework for automated data science. ADSA addresses these limitations through three core innovations: (1) expert knowledge base–augmented experience modeling and dynamic reuse; (2) an informed tree search algorithm enabling structured, interpretable reasoning navigation; and (3) task-complexity–aware adaptive code generation. The method integrates domain knowledge retrieval, hierarchical tree-based reasoning, collaborative LLM execution, and dynamic code optimization. Evaluated on two comprehensive benchmarks—AutoML-Bench and DSBench—ADSA achieves state-of-the-art performance across all metrics, significantly improving solution quality, execution efficiency, and generalization robustness.

Technology Category

Application Category

📝 Abstract
Large Language Model (LLM) agents have shown great potential in addressing real-world data science problems. LLM-driven data science agents promise to automate the entire machine learning pipeline, yet their real-world effectiveness remains limited. Existing frameworks depend on rigid, pre-defined workflows and inflexible coding strategies; consequently, they excel only on relatively simple, classical problems and fail to capture the empirical expertise that human practitioners bring to complex, innovative tasks. In this work, we introduce AutoMind, an adaptive, knowledgeable LLM-agent framework that overcomes these deficiencies through three key advances: (1) a curated expert knowledge base that grounds the agent in domain expert knowledge, (2) an agentic knowledgeable tree search algorithm that strategically explores possible solutions, and (3) a self-adaptive coding strategy that dynamically tailors code generation to task complexity. Evaluations on two automated data science benchmarks demonstrate that AutoMind delivers superior performance versus state-of-the-art baselines. Additional analyses confirm favorable effectiveness, efficiency, and qualitative solution quality, highlighting AutoMind as an efficient and robust step toward fully automated data science.
Problem

Research questions and friction points this paper is trying to address.

Overcoming rigid workflows in LLM-driven data science agents
Enhancing adaptability in automated machine learning pipelines
Capturing empirical expertise for complex, innovative tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Curated expert knowledge base for domain expertise
Agentic knowledgeable tree search for solution exploration
Self-adaptive coding strategy for dynamic complexity handling
Y
Yixin Ou
Zhejiang University - Ant Group Joint Laboratory of Knowledge Graph
Y
Yujie Luo
Zhejiang University - Ant Group Joint Laboratory of Knowledge Graph
J
Jingsheng Zheng
Zhejiang University - Ant Group Joint Laboratory of Knowledge Graph
L
Lanning Wei
Ant Group
Shuofei Qiao
Shuofei Qiao
Zhejiang University
AI AgentLarge Language ModelsNatural Language ProcessingKnowledge Graphs
Jintian Zhang
Jintian Zhang
Zhejiang University
NLPLLMs
Da Zheng
Da Zheng
Amazon
High-performance computingData-intensive computingLarge-scale machine learningGraph neural networks
H
Huajun Chen
Zhejiang University - Ant Group Joint Laboratory of Knowledge Graph
Ningyu Zhang
Ningyu Zhang
Ph.D. Student, Vanderbilt University
artificial intelligencelearning analyticslearning environments