🤖 AI Summary
To address the prohibitively high computational cost of large language models (LLMs) in code generation—stemming from frequent human-in-the-loop interactions and excessive token consumption—this paper proposes a lightweight evolutionary algorithm (EA)-driven prompt optimization method. It is the first to integrate EA operators (mutation, selection, and execution-feedback-based fitness evaluation) into prompt engineering, enabling automatic evolution of high-quality zero-shot or few-shot prompts with minimal LLM invocations. Unlike conventional iterative paradigms reliant on repeated execution feedback—which incur substantial overhead—our approach breaks the cost bottleneck while maintaining efficacy. Evaluated across multiple code generation benchmarks, it outperforms state-of-the-art methods: achieving up to a 3.2× improvement in code correctness per unit cost and reducing average token consumption by 67%. The core contribution is a novel, low-overhead, low-interaction, high-accuracy paradigm for automated prompt optimization.
📝 Abstract
Large Language Models (LLMs) have seen increasing use in various software development tasks, especially in code generation. The most advanced recent methods attempt to incorporate feedback from code execution into prompts to help guide LLMs in generating correct code, in an iterative process. While effective, these methods could be costly and time-consuming due to numerous interactions with the LLM and the extensive token usage. To address this issue, we propose an alternative approach named Evolutionary Prompt Engineering for Code (EPiC), which leverages a lightweight evolutionary algorithm to evolve the original prompts toward better ones that produce high-quality code, with minimal interactions with LLM. Our evaluation against state-of-the-art (SOTA) LLM-based code generation models shows that EPiC outperforms all the baselines in terms of cost-effectiveness.