AIRA_2: Overcoming Bottlenecks in AI Research Agents

📅 2026-03-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work identifies and addresses three fundamental structural bottlenecks in current AI research agents: low throughput due to single-GPU synchronous execution, generalization gaps induced by validation-based selection, and inflexibility stemming from fixed, single-turn LLM operations. To overcome these limitations, the study introduces an asynchronous multi-GPU worker pool to dramatically increase experimental throughput, proposes a Hidden Consistent Evaluation protocol to eliminate assessment noise and yield reliable signals, and integrates a ReAct agent for dynamic action planning and interactive debugging. Evaluated on MLE-bench-30, the proposed approach achieves an average percentile rank of 71.8% within 24 hours, improving to 76.0% at 72 hours—substantially outperforming prior state-of-the-art results.
📝 Abstract
Existing research has identified three structural performance bottlenecks in AI research agents: (1) synchronous single-GPU execution constrains sample throughput, limiting the benefit of search; (2) a generalization gap where validation-based selection causes performance to degrade over extended search horizons; and (3) the limited capability of fixed, single-turn LLM operators imposes a ceiling on search performance. We introduce AIRA$_2$, which addresses these bottlenecks through three architectural choices: an asynchronous multi-GPU worker pool that increases experiment throughput linearly; a Hidden Consistent Evaluation protocol that delivers a reliable evaluation signal; and ReAct agents that dynamically scope their actions and debug interactively. On MLE-bench-30, AIRA$_2$ achieves a mean Percentile Rank of 71.8% at 24 hours - surpassing the previous best of 69.9% - and steadily improves to 76.0% at 72 hours. Ablation studies reveal that each component is necessary and that the "overfitting" reported in prior work was driven by evaluation noise rather than true data memorization.
Problem

Research questions and friction points this paper is trying to address.

AI research agents
performance bottlenecks
generalization gap
single-GPU execution
LLM operators
Innovation

Methods, ideas, or system contributions that make the work stand out.

asynchronous multi-GPU
Hidden Consistent Evaluation
ReAct agents
generalization gap
AI research agents
🔎 Similar Papers
No similar papers found.
Karen Hambardzumyan
Karen Hambardzumyan
FAIR, Meta + University College London
InterpretabilityNatural Language ProcessingFew-Shot Learning
N
Nicolas Baldwin
FAIR at Meta
Edan Toledo
Edan Toledo
Meta & UCL
Reinforcement LearningNatural Language ProcessingMulti Agent Reinforcement Learning
R
Rishi Hazra
FAIR at Meta
Michael Kuchnik
Michael Kuchnik
Meta
computer systemsmachine learning
Bassel Al Omari
Bassel Al Omari
University of Waterloo
T
Thomas Simon Foster
FAIR at Meta, University of Oxford
A
Anton Protopopov
FAIR at Meta
J
Jean-Christophe Gagnon-Audet
FAIR at Meta
Ishita Mediratta
Ishita Mediratta
Meta FAIR
Deep LearningMultimodal LearningReinforcement Learning
K
Kelvin Niu
FAIR at Meta
Michael Shvartsman
Michael Shvartsman
Research Scientist, Meta Reality Labs Research
Computational cognitive science and machine learning for neuroscience
Alisia Lupidi
Alisia Lupidi
University of Cambridge
A
Alexis Audran-Reiss
FAIR at Meta
P
Parth Pathak
FAIR at Meta
Tatiana Shavrina
Tatiana Shavrina
Meta
Natural language processingcomputational linguisticsbenchmarkingmultilinguality
Despoina Magka
Despoina Magka
University of Oxford, Department of Computer Science
Artificial intelligenceKnowledge representation and reasoningLogic
H
Hela Momand
FAIR at Meta
D
Derek Dunfield
FAIR at Meta
Nicola Cancedda
Nicola Cancedda
Research Scientist Manager, FAIR, Meta
AIMLNLP
P
Pontus Stenetorp
University College London
Carole-Jean Wu
Carole-Jean Wu
Meta AI / FAIR
Machine Learning SystemsComputer ArchitectureMemory Subsystem DesignEnergySustainability
J
Jakob Nicolaus Foerster
FAIR at Meta, University of Oxford
Yoram Bachrach
Yoram Bachrach
Meta (FAIR)
Artificial IntelligenceMachine LearningMultiagent Systems
Martin Josifoski
Martin Josifoski
Meta