GEAR: Genetic AutoResearch for Agentic Code Evolution

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

246K/year

🤖 AI Summary

Traditional autonomous research agents rely on a single search trajectory, making them prone to local optima and causing them to overlook valuable information embedded in failed experiments. This work proposes GEAR—a population-based autonomous research framework that integrates genetic algorithms into the scientific discovery process. By maintaining a diverse set of research states and employing multi-objective selection (balancing productivity, novelty, and coverage), code mutation and crossover operators, a reflective memory mechanism, and an evolvable program controller, GEAR enables parallel exploration across multiple pathways and sustained long-term optimization. Under identical computational budgets, multiple variants of GEAR consistently outperform the baseline AutoResearch agent, demonstrating superior exploratory capability and continuous performance improvement.

📝 Abstract

Autonomous research agents can already run machine learning experiments without human supervision, but many rely on a narrow search strategy: they repeatedly modify one program and keep changes only when they improve the current best result. This can cause them to discard useful partial ideas, alternative promising directions, and insights from failed or incomplete experiments. GEAR, or Genetic AutoResearch, replaces this single-path search with a population-based search over multiple research states. It keeps a set of strong candidate solutions, selects parents based on productivity, novelty, and coverage, and explores new ideas through mutation and crossover. Each research state stores its code changes, reflections, and performance data, allowing future decisions to build on past discoveries. The paper studies three versions of GEAR: one controlled through prompting, one using a fixed programmatic search controller, and one where the controller itself can evolve during the run. Under the same compute budget and environment, all three versions outperform the AutoResearch baseline. More importantly, while the baseline tends to settle into one local optimum, GEAR continues finding improvements over longer runs. Overall, the results suggest that autonomous research agents become more effective when they maintain multiple promising directions and can adapt their search strategy over time.

Problem

Research questions and friction points this paper is trying to address.

autonomous research agents

single-path search

partial ideas

promising directions

failed experiments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Genetic AutoResearch

population-based search

autonomous research agents