SPIO: Ensemble and Selective Strategies via LLM-Based Multi-Agent Planning in Automated Data Science

📅 2025-03-30

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Existing AutoML pipelines rely on rigid, single-path workflows, limiting strategy exploration and flexibility, thereby constraining predictive performance. To address this, we propose SPIO—a novel LLM-driven, multi-agent collaborative AutoML framework that dynamically orchestrates preprocessing, feature engineering, modeling, and hyperparameter optimization via modular strategy generation, sequential plan integration and refinement, k-best ensemble construction, and LLM-based meta-evaluation for component selection. SPIO introduces two complementary paradigms: SPIO-S (single-path selection), which leverages LLMs to identify the optimal unified pipeline, and SPIO-E (ensemble-based multi-path integration), which fuses diverse high-performing pipelines. Evaluated across multiple Kaggle and OpenML benchmarks, SPIO consistently outperforms state-of-the-art AutoML systems in both accuracy and robustness, while demonstrating strong scalability and generalization.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have revolutionized automated data analytics and machine learning by enabling dynamic reasoning and adaptability. While recent approaches have advanced multi-stage pipelines through multi-agent systems, they typically rely on rigid, single-path workflows that limit the exploration and integration of diverse strategies, often resulting in suboptimal predictions. To address these challenges, we propose SPIO (Sequential Plan Integration and Optimization), a novel framework that leverages LLM-driven decision-making to orchestrate multi-agent planning across four key modules: data preprocessing, feature engineering, modeling, and hyperparameter tuning. In each module, dedicated planning agents independently generate candidate strategies that cascade into subsequent stages, fostering comprehensive exploration. A plan optimization agent refines these strategies by suggesting several optimized plans. We further introduce two variants: SPIO-S, which selects a single best solution path as determined by the LLM, and SPIO-E, which selects the top k candidate plans and ensembles them to maximize predictive performance. Extensive experiments on Kaggle and OpenML datasets demonstrate that SPIO significantly outperforms state-of-the-art methods, providing a robust and scalable solution for automated data science task.

Problem

Research questions and friction points this paper is trying to address.

Overcoming rigid single-path workflows in automated data science

Enhancing diverse strategy exploration via multi-agent planning

Optimizing predictive performance through ensemble and selective strategies

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-driven multi-agent planning for diverse strategies

Four-stage optimization: data, features, model, tuning

Two variants: single-path selection and ensemble top-k

🔎 Similar Papers

AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML