π€ AI Summary
Large language models (LLMs) are typically employed as static generators in automated algorithm discovery, lacking mechanisms to dynamically refine their behavior using feedback from evolutionary search.
Method: We propose an RL-augmented evolutionary search framework that integrates Proximal Policy Optimization (PPO)βa policy gradient reinforcement learning methodβinto an LLM-driven evolutionary closed loop. Within this framework, the LLM serves as an evolvable search operator, iteratively improving its algorithmic proposals for combinatorial optimization tasks (bin packing, TSP, Flatpack) via reinforcement signals derived from solution quality.
Contribution/Results: This work presents the first end-to-end joint optimization of RL and LLM-based evolutionary search, moving beyond conventional prompt engineering paradigms. Experiments demonstrate substantial improvements in both the efficiency of discovering high-performing novel algorithms and their generalization across problem instances, validating the effectiveness and scalability of dynamic model evolution for automated algorithm design.
π Abstract
Discovering efficient algorithms for solving complex problems has been an outstanding challenge in mathematics and computer science, requiring substantial human expertise over the years. Recent advancements in evolutionary search with large language models (LLMs) have shown promise in accelerating the discovery of algorithms across various domains, particularly in mathematics and optimization. However, existing approaches treat the LLM as a static generator, missing the opportunity to update the model with the signal obtained from evolutionary exploration. In this work, we propose to augment LLM-based evolutionary search by continuously refining the search operator - the LLM - through reinforcement learning (RL) fine-tuning. Our method leverages evolutionary search as an exploration strategy to discover improved algorithms, while RL optimizes the LLM policy based on these discoveries. Our experiments on three combinatorial optimization tasks - bin packing, traveling salesman, and the flatpack problem - show that combining RL and evolutionary search improves discovery efficiency of improved algorithms, showcasing the potential of RL-enhanced evolutionary strategies to assist computer scientists and mathematicians for more efficient algorithm design.