🤖 AI Summary
This work addresses key limitations of large language models (LLMs) in evolutionary search—namely, context contamination, mode collapse, and weak collaborative capabilities—stemming from the absence of a systematic optimization framework. To overcome these challenges, we propose PACEvolve, a novel framework that uniquely integrates Hierarchical Context Management (HCM), Momentum Backtracking (MBB), and Adaptive Coevolution (CE). By employing dynamic sampling and pruning strategies, PACEvolve enables robust control over the evolutionary process, facilitating long-horizon, consistent autonomous optimization. Our method achieves state-of-the-art performance on LLM-SR and KernelBench benchmarks and discovers solutions in the Modded NanoGPT task that surpass existing records.
📝 Abstract
Large Language Models (LLMs) have emerged as powerful operators for evolutionary search, yet the design of efficient search scaffolds remains ad hoc. While promising, current LLM-in-the-loop systems lack a systematic approach to managing the evolutionary process. We identify three distinct failure modes: Context Pollution, where experiment history biases future candidate generation; Mode Collapse, where agents stagnate in local minima due to poor exploration-exploitation balance; and Weak Collaboration, where rigid crossover strategies fail to leverage parallel search trajectories effectively. We introduce Progress-Aware Consistent Evolution (PACEvolve), a framework designed to robustly govern the agent's context and search dynamics, to address these challenges. PACEvolve combines hierarchical context management (HCM) with pruning to address context pollution; momentum-based backtracking (MBB) to escape local minima; and a self-adaptive sampling policy that unifies backtracking and crossover for dynamic search coordination (CE), allowing agents to balance internal refinement with cross-trajectory collaboration. We demonstrate that PACEvolve provides a systematic path to consistent, long-horizon self-improvement, achieving state-of-the-art results on LLM-SR and KernelBench, while discovering solutions surpassing the record on Modded NanoGPT.