π€ AI Summary
This paper addresses the stagnation in performance, code redundancy, and stylistic rigidity commonly observed when large language models (LLMs) iteratively generate algorithms within evolutionary computation frameworks. To model the dynamic iterative trajectory of LLM-generated code, we propose the **Code Evolution Graph**βthe first formal graph-based representation of such evolution. Leveraging static analysis, graph representation learning, and cross-model behavioral comparison across three benchmark task categories, we empirically reveal: (i) iterative generation often increases code complexity while degrading performance; (ii) generated code exhibits significant heterogeneity and stylistic isolation across models; and (iii) repeated prompting induces redundant overcomplication. Building on these insights, we introduce a **multi-LLM co-evolution paradigm**, which demonstrably mitigates degeneration and improves solution quality. Our approach provides an interpretable, controllable pathway for LLM-driven automated algorithm design.
π Abstract
Large Language Models (LLMs) have demonstrated great promise in generating code, especially when used inside an evolutionary computation framework to iteratively optimize the generated algorithms. However, in some cases they fail to generate competitive algorithms or the code optimization stalls, and we are left with no recourse because of a lack of understanding of the generation process and generated codes. We present a novel approach to mitigate this problem by enabling users to analyze the generated codes inside the evolutionary process and how they evolve over repeated prompting of the LLM. We show results for three benchmark problem classes and demonstrate novel insights. In particular, LLMs tend to generate more complex code with repeated prompting, but additional complexity can hurt algorithmic performance in some cases. Different LLMs have different coding ``styles'' and generated code tends to be dissimilar to other LLMs. These two findings suggest that using different LLMs inside the code evolution frameworks might produce higher performing code than using only one LLM.