🤖 AI Summary
This work proposes a training-free multi-agent optimization framework designed to simultaneously satisfy correctness and performance requirements for system code under the constraint of accessing large language models solely through API calls. The approach decouples the optimization context into three orthogonal dimensions—semantic summarization, directional guidance, and experience sampling—and, for the first time, establishes a functional isomorphism in textual latent space that emulates reinforcement learning components: state representation, policy gradients, and experience replay. Efficient directed evolution is achieved through multi-agent collaboration, code-language abstraction, trajectory-guided directional distillation, and priority-based exemplar retrieval. Evaluated on the ADRS benchmark, the method outperforms the current state-of-the-art by 33.3% in performance while reducing token consumption by 29.0%.
📝 Abstract
Large language models are transforming systems research by automating the discovery of performance-critical algorithms for computer systems. Despite plausible codes generated by LLMs, producing solutions that meet the stringent correctness and performance requirements of systems demands iterative optimization. Test-time reinforcement learning offers high search efficiency but requires parameter updates infeasible under API-only access, while existing training-free evolutionary methods suffer from inefficient context utilization and undirected search. We introduce ContextEvolve, a multi-agent framework that achieves RL-level search efficiency under strict parameter-blind constraints by decomposing optimization context into three orthogonal dimensions: a Summarizer Agent condenses semantic state via code-to-language abstraction, a Navigator Agent distills optimization direction from trajectory analysis, and a Sampler Agent curates experience distribution through prioritized exemplar retrieval. This orchestration forms a functional isomorphism with RL-mapping to state representation, policy gradient, and experience replay-enabling principled optimization in a textual latent space. On the ADRS benchmark, ContextEvolve outperforms state-of-the-art baselines by 33.3% while reducing token consumption by 29.0%. Codes for our work are released at https://anonymous.4open.science/r/ContextEvolve-ACC