🤖 AI Summary
This work addresses the challenge of autonomously discovering novel solutions to open-ended algorithmic problems using large language models (LLMs). We propose AlphaResearch, an LLM-based agent that introduces a novel dual-environment co-execution framework—integrating *execution-based validation* and *simulated peer review*—to establish a reproducible, evaluable research loop. The framework comprises LLM-driven idea generation, dual-path verification (via code execution and academic-style critique simulation), and iterative refinement. To enable rigorous evaluation, we release AlphaResearchComp, a benchmark comprising eight open algorithmic tasks with standardized metrics and reproducibility protocols. Empirical results show that AlphaResearch achieves state-of-the-art performance on two tasks; notably, its discovered algorithm for the “circular permutation” problem surpasses all human-designed baselines and expert solutions, demonstrating the paradigm’s efficacy and novelty in automated algorithm discovery.
📝 Abstract
Large language models have made significant progress in complex but easy-to-verify problems, yet they still struggle with discovering the unknown. In this paper, we present extbf{AlphaResearch}, an autonomous research agent designed to discover new algorithms on open-ended problems. To synergize the feasibility and innovation of the discovery process, we construct a novel dual research environment by combining the execution-based verify and simulated real-world peer review environment. AlphaResearch discovers new algorithm by iteratively running the following steps: (1) propose new ideas (2) verify the ideas in the dual research environment (3) optimize the research proposals for better performance. To promote a transparent evaluation process, we construct extbf{AlphaResearchComp}, a new evaluation benchmark that includes an eight open-ended algorithmic problems competition, with each problem carefully curated and verified through executable pipelines, objective metrics, and reproducibility checks. AlphaResearch gets a 2/8 win rate in head-to-head comparison with human researchers, demonstrate the possibility of accelerating algorithm discovery with LLMs. Notably, the algorithm discovered by AlphaResearch on the emph{``packing circles''} problem achieves the best-of-known performance, surpassing the results of human researchers and strong baselines from recent work (e.g., AlphaEvolve). Additionally, we conduct a comprehensive analysis of the remaining challenges of the 6/8 failure cases, providing valuable insights for future research.