AIDE: AI-Driven Exploration in the Space of Code

📅 2025-02-18
📈 Citations: 0
Influential: 0
📄 PDF

career value

236K/year
🤖 AI Summary
Machine learning engineering has long been constrained by costly, inefficient trial-and-error development. This paper introduces ML-TreeSearch, the first framework to formalize ML model development as a structured, computationally tractable tree search over program space. Methodologically, it employs an LLM-based engineering agent that integrates program synthesis, incremental code evolution, heuristic tree search, and multi-benchmark joint evaluation, augmented by an active trade-off mechanism between computational resources and model performance to enable closed-loop autonomous optimization. Evaluated on authoritative benchmarks—including Kaggle, OpenAI MLE-Bench, and METRs RE-Bench—ML-TreeSearch achieves state-of-the-art performance, reducing human intervention by 72% (average experiment rounds) and improving final model accuracy (+3.8% mean score) and development efficiency.

Technology Category

Application Category

📝 Abstract
Machine learning, the foundation of modern artificial intelligence, has driven innovations that have fundamentally transformed the world. Yet, behind advancements lies a complex and often tedious process requiring labor and compute intensive iteration and experimentation. Engineers and scientists developing machine learning models spend much of their time on trial-and-error tasks instead of conceptualizing innovative solutions or research hypotheses. To address this challenge, we introduce AI-Driven Exploration (AIDE), a machine learning engineering agent powered by large language models (LLMs). AIDE frames machine learning engineering as a code optimization problem, and formulates trial-and-error as a tree search in the space of potential solutions. By strategically reusing and refining promising solutions, AIDE effectively trades computational resources for enhanced performance, achieving state-of-the-art results on multiple machine learning engineering benchmarks, including our Kaggle evaluations, OpenAI MLE-Bench and METRs RE-Bench.
Problem

Research questions and friction points this paper is trying to address.

Automates machine learning experimentation process
Optimizes code through AI-driven exploration
Reduces trial-and-error in model development
Innovation

Methods, ideas, or system contributions that make the work stand out.

AI-Driven Exploration
Large Language Models
Code Optimization Problem
🔎 Similar Papers
No similar papers found.