ArchPilot: A Proxy-Guided Multi-Agent Approach for Machine Learning Engineering

📅 2025-11-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM-driven AutoML agents rely on repeated full-model training to evaluate candidate architectures, resulting in prohibitive computational overhead, low search efficiency, and poor scalability. To address this, we propose a multi-agent collaborative framework comprising a generation agent, an evaluation agent, and a coordination agent. Our method integrates fidelity-aware score aggregation, a lightweight proxy-based evaluation mechanism, a memory-enhanced modified Monte Carlo Tree Search (MCTS) algorithm, and a dynamic restart strategy to minimize expensive training calls. The core innovation lies in replacing a substantial portion of costly training with agent-mediated scoring—enabling a tightly coupled, end-to-end闭环 of architecture generation, evaluation, and search. On MLE-Bench, our approach significantly outperforms state-of-the-art systems including AIDE and ML-Master, achieving superior search efficiency and scalability under constrained computational budgets.

Technology Category

Application Category

📝 Abstract
Recent LLM-based agents have demonstrated strong capabilities in automated ML engineering. However, they heavily rely on repeated full training runs to evaluate candidate solutions, resulting in significant computational overhead, limited scalability to large search spaces, and slow iteration cycles. To address these challenges, we introduce ArchPilot, a multi-agent system that integrates architecture generation, proxy-based evaluation, and adaptive search into a unified framework. ArchPilot consists of three specialized agents: an orchestration agent that coordinates the search process using a Monte Carlo Tree Search (MCTS)-inspired novel algorithm with a restart mechanism and manages memory of previous candidates; a generation agent that iteratively generates, improves, and debugs candidate architectures; and an evaluation agent that executes proxy training runs, generates and optimizes proxy functions, and aggregates the proxy scores into a fidelity-aware performance metric. This multi-agent collaboration allows ArchPilot to prioritize high-potential candidates with minimal reliance on expensive full training runs, facilitating efficient ML engineering under limited budgets. Experiments on MLE-Bench demonstrate that ArchPilot outperforms SOTA baselines such as AIDE and ML-Master, validating the effectiveness of our multi-agent system.
Problem

Research questions and friction points this paper is trying to address.

Reducing computational overhead in automated ML engineering systems
Improving scalability for large architecture search spaces
Accelerating iteration cycles through proxy-guided evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent system with proxy-based evaluation
Monte Carlo Tree Search with restart mechanism
Fidelity-aware performance metric for candidate prioritization
🔎 Similar Papers
No similar papers found.
Z
Zhuowen Yuan
Meta Ranking AI Research, UIUC
T
Tao Liu
Meta Ranking AI Research
Y
Yang Yang
Meta Ranking AI Research
Y
Yang Wang
Meta Ranking AI Research
Feng Qi
Feng Qi
Retired researcher
Special FunctionsAnalytic CombinatoricsAnalytic Number TheoryMathematical Inequalities
Kaushik Rangadurai
Kaushik Rangadurai
Researcher at Meta
Machine LearningArtificial IntelligenceSearch
B
Bo Li
UIUC
S
Shuang Yang
Meta Ranking AI Research