AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench

📅 2025-07-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the limited capability of AI research agents in solving realistic machine learning problems on the MLE-bench benchmark—particularly MLE-bench Lite—due to bottlenecks in model search, exploration, and generalization. Method: We propose a co-optimization framework integrating search strategies with a learnable operator set, modeling the agent as a programmable, differentiable search process. The atomic operator set comprehensively covers data preprocessing, architecture selection, and hyperparameter optimization. We unify greedy search, Monte Carlo Tree Search (MCTS), and evolutionary algorithms to enable end-to-end automated model design, training, and evaluation on Kaggle competition tasks. Contribution/Results: Our framework significantly improves the success rate of earning Kaggle medals from 39.6% to 47.7%, establishing a new state-of-the-art. It is the first work to empirically demonstrate that tight coupling between search strategy and the operator space is critical for AutoML performance.

Technology Category

Application Category

📝 Abstract
AI research agents are demonstrating great potential to accelerate scientific progress by automating the design, implementation, and training of machine learning models. We focus on methods for improving agents' performance on MLE-bench, a challenging benchmark where agents compete in Kaggle competitions to solve real-world machine learning problems. We formalize AI research agents as search policies that navigate a space of candidate solutions, iteratively modifying them using operators. By designing and systematically varying different operator sets and search policies (Greedy, MCTS, Evolutionary), we show that their interplay is critical for achieving high performance. Our best pairing of search strategy and operator set achieves a state-of-the-art result on MLE-bench lite, increasing the success rate of achieving a Kaggle medal from 39.6% to 47.7%. Our investigation underscores the importance of jointly considering the search strategy, operator design, and evaluation methodology in advancing automated machine learning.
Problem

Research questions and friction points this paper is trying to address.

Improving AI research agents' performance on MLE-bench
Exploring search policies and operator sets for ML models
Achieving state-of-the-art results in automated machine learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

AI agents automate ML model design and training
Search policies optimize candidate solution navigation
Operator-search interplay boosts benchmark performance
🔎 Similar Papers
No similar papers found.
Edan Toledo
Edan Toledo
Meta & UCL
Reinforcement LearningNatural Language ProcessingMulti Agent Reinforcement Learning
Karen Hambardzumyan
Karen Hambardzumyan
FAIR, Meta + University College London
InterpretabilityNatural Language ProcessingFew-Shot Learning
Martin Josifoski
Martin Josifoski
Meta
R
Rishi Hazra
Örebro University
N
Nicolas Baldwin
FAIR at Meta
A
Alexis Audran-Reiss
FAIR at Meta
Michael Kuchnik
Michael Kuchnik
Meta
computer systemsmachine learning
Despoina Magka
Despoina Magka
University of Oxford, Department of Computer Science
Artificial intelligenceKnowledge representation and reasoningLogic
M
Minqi Jiang
FAIR at Meta
A
Alisia Maria Lupidi
FAIR at Meta
Andrei Lupu
Andrei Lupu
University of Oxford & FAIR, Meta AI
Reinforcement LearningMulti-Agent RL
Roberta Raileanu
Roberta Raileanu
Research Scientist at Google DeepMind, Honorary Lecturer at UCL
Artificial IntelligenceReinforcement LearningDeep LearningOpen-Ended Learning
K
Kelvin Niu
FAIR at Meta
Tatiana Shavrina
Tatiana Shavrina
Meta
Natural language processingcomputational linguisticsbenchmarkingmultilinguality
J
Jean-Christophe Gagnon-Audet
FAIR at Meta
Michael Shvartsman
Michael Shvartsman
Research Scientist, Meta Reality Labs Research
Computational cognitive science and machine learning for neuroscience
Shagun Sodhani
Shagun Sodhani
Google DeepMind
Machine LearningReinforcement LearningLifelong Learning
A
Alexander H. Miller
FAIR at Meta
A
Abhishek Charnalia
FAIR at Meta
D
Derek Dunfield
FAIR at Meta
Carole-Jean Wu
Carole-Jean Wu
Meta AI / FAIR
Machine Learning SystemsComputer ArchitectureMemory Subsystem DesignEnergySustainability
P
Pontus Stenetorp
University College London
Nicola Cancedda
Nicola Cancedda
Research Scientist Manager, FAIR, Meta
AIMLNLP
J
Jakob Nicolaus Foerster
FAIR at Meta
Yoram Bachrach
Yoram Bachrach
Meta (FAIR)
Artificial IntelligenceMachine LearningMultiagent Systems