ArchPilot: A Proxy-Guided Multi-Agent Approach for Machine Learning Engineering

📅 2025-11-06

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

Existing LLM-driven AutoML agents rely on repeated full-model training to evaluate candidate architectures, resulting in prohibitive computational overhead, low search efficiency, and poor scalability. To address this, we propose a multi-agent collaborative framework comprising a generation agent, an evaluation agent, and a coordination agent. Our method integrates fidelity-aware score aggregation, a lightweight proxy-based evaluation mechanism, a memory-enhanced modified Monte Carlo Tree Search (MCTS) algorithm, and a dynamic restart strategy to minimize expensive training calls. The core innovation lies in replacing a substantial portion of costly training with agent-mediated scoring—enabling a tightly coupled, end-to-end闭环 of architecture generation, evaluation, and search. On MLE-Bench, our approach significantly outperforms state-of-the-art systems including AIDE and ML-Master, achieving superior search efficiency and scalability under constrained computational budgets.

Technology Category

Application Category

📝 Abstract

Recent LLM-based agents have demonstrated strong capabilities in automated ML engineering. However, they heavily rely on repeated full training runs to evaluate candidate solutions, resulting in significant computational overhead, limited scalability to large search spaces, and slow iteration cycles. To address these challenges, we introduce ArchPilot, a multi-agent system that integrates architecture generation, proxy-based evaluation, and adaptive search into a unified framework. ArchPilot consists of three specialized agents: an orchestration agent that coordinates the search process using a Monte Carlo Tree Search (MCTS)-inspired novel algorithm with a restart mechanism and manages memory of previous candidates; a generation agent that iteratively generates, improves, and debugs candidate architectures; and an evaluation agent that executes proxy training runs, generates and optimizes proxy functions, and aggregates the proxy scores into a fidelity-aware performance metric. This multi-agent collaboration allows ArchPilot to prioritize high-potential candidates with minimal reliance on expensive full training runs, facilitating efficient ML engineering under limited budgets. Experiments on MLE-Bench demonstrate that ArchPilot outperforms SOTA baselines such as AIDE and ML-Master, validating the effectiveness of our multi-agent system.

Problem

Research questions and friction points this paper is trying to address.

Reducing computational overhead in automated ML engineering systems

Improving scalability for large architecture search spaces

Accelerating iteration cycles through proxy-guided evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent system with proxy-based evaluation

Monte Carlo Tree Search with restart mechanism

Fidelity-aware performance metric for candidate prioritization

🔎 Similar Papers

System for systematic literature review using multiple AI agents: Concept and an empirical evaluation