🤖 AI Summary
Existing code evolution methods suffer from low sample efficiency—requiring thousands of evaluations—and are closed-source, severely hindering reproducibility and extensibility. This paper introduces ShinkaEvolve: the first open-source, high-sample-efficiency LLM-driven program evolution framework. Methodologically, it employs LLMs as mutation operators within an evolutionary agent, integrating ensemble learning and dynamic evaluation. Key innovations include parent sampling strategies that balance exploration and exploitation, novelty-based rejection sampling for code candidates, and a bandit-driven mechanism for adaptive LLM ensemble selection. Experiments demonstrate that ShinkaEvolve discovers the optimal circle-packing solution within only 150 evaluations. It achieves significant performance gains on AIME mathematical reasoning and ALE-Bench programming benchmarks. Moreover, it autonomously discovers a novel MoE load-balancing loss function, validating its effectiveness and generalizability in automated scientific discovery.
📝 Abstract
We introduce ShinkaEvolve: a new open-source framework leveraging large language models (LLMs) to advance scientific discovery with state-of-the-art performance and unprecedented efficiency. Recent advances in scaling inference time compute of LLMs have enabled significant progress in generalized scientific discovery. These approaches rely on evolutionary agentic harnesses that leverage LLMs as mutation operators to generate candidate solutions. However, current code evolution methods suffer from critical limitations: they are sample inefficient, requiring thousands of samples to identify effective solutions, and remain closed-source, hindering broad adoption and extension. ShinkaEvolve addresses these limitations, introducing three key innovations: a parent sampling technique balancing exploration and exploitation, code novelty rejection-sampling for efficient search space exploration, and a bandit-based LLM ensemble selection strategy. We evaluate ShinkaEvolve across diverse tasks, demonstrating consistent improvements in sample efficiency and solution quality. ShinkaEvolve discovers a new state-of-the-art circle packing solution using only 150 samples, designs high-performing agentic harnesses for AIME mathematical reasoning tasks, identifies improvements to ALE-Bench competitive programming solutions, and discovers novel mixture-of-expert load balancing loss functions that illuminate the space of optimization strategies. Our results demonstrate that ShinkaEvolve achieves broad applicability with exceptional sample efficiency. By providing open-source accessibility and cost-efficiency, this work democratizes open-ended discovery across diverse computational problems.