Reasoning as Gradient: Scaling MLE Agents Beyond Tree Search

📅 2026-03-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes Gome, a novel agent framework that introduces gradient-based optimization into maximum likelihood estimation (MLE) agent design—an approach traditionally reliant on gradient-free tree search methods, which suffer from limited efficiency despite advances in large language model (LLM) reasoning capabilities. Gome reformulates structured diagnostic reasoning as gradient computation, leverages successful memory as momentum, and employs parallel multi-trajectory execution to emulate distributed optimization, thereby establishing a gradient-driven optimization paradigm tailored for MLE tasks. Evaluated on MLE-Bench under stringent constraints—using only a single V100 GPU and a 12-hour time limit—Gome achieves a 35.1% any-medal success rate. Extensive experiments across ten diverse models demonstrate its pronounced advantage over conventional tree-search approaches, particularly with strong reasoning LLMs, effectively overcoming the scalability limitations inherent in traditional methods.

Technology Category

Application Category

📝 Abstract
LLM-based agents for machine learning engineering (MLE) predominantly rely on tree search, a form of gradient-free optimization that uses scalar validation scores to rank candidates. As LLM reasoning capabilities improve, exhaustive enumeration becomes increasingly inefficient compared to directed updates, analogous to how accurate gradients enable efficient descent over random search. We introduce \textsc{Gome}, an MLE agent that operationalizes gradient-based optimization. \textsc{Gome} maps structured diagnostic reasoning to gradient computation, success memory to momentum, and multi-trace execution to distributed optimization. Under a closed-world protocol that isolates architectural effects from external knowledge, \textsc{Gome} achieves a state-of-the-art 35.1\% any-medal rate on MLE-Bench with a restricted 12-hour budget on a single V100 GPU. Scaling experiments across 10 models reveal a critical crossover: with weaker models, tree search retains advantages by compensating for unreliable reasoning through exhaustive exploration; as reasoning capability strengthens, gradient-based optimization progressively outperforms, with the gap widening at frontier-tier models. Given the rapid advancement of reasoning-oriented LLMs, this positions gradient-based optimization as an increasingly favorable paradigm. We release our codebase and GPT-5 traces.
Problem

Research questions and friction points this paper is trying to address.

LLM-based agents
machine learning engineering
tree search
gradient-based optimization
reasoning capability
Innovation

Methods, ideas, or system contributions that make the work stand out.

gradient-based optimization
LLM reasoning
MLE agent
tree search
scaling laws
Y
Yifei Zhang
Microsoft Research Asia
X
Xu Yang
Microsoft Research Asia
Xiao Yang
Xiao Yang
Microsoft
Machine LearningFintechQuant
B
Bowen Xian
Microsoft Research Asia
Q
Qizheng Li
Microsoft Research Asia
S
Shikai Fang
Zhejiang University
Jingyuan Li
Jingyuan Li
University of Washington
J
Jian Wang
Microsoft Research Asia
M
Mingrui Xu
Zhejiang University
W
Weiqing Liu
Microsoft Research Asia
Jiang Bian
Jiang Bian
Microsoft Research
Industry AIRLReasoningSpatial Intelligence