🤖 AI Summary
Small general-purpose large language models (LLMs) struggle to achieve high performance on complex tasks using discrete prompts—tasks requiring richly detailed instructions—due to the limited effectiveness and rapid degradation of existing automated prompt optimization methods (e.g., PromptWizard, OPRO, RL-Prompt) on such models.
Method: We propose a grammar-guided evolutionary search framework that integrates genetic programming with local search, centered on program synthesis to generate reusable prompt-generation programs. Innovatively, we introduce syntax constraints, function composition, and dictionary/LLM-coordinated prompt editing operators.
Contribution/Results: Our approach significantly improves search efficiency and cross-task generalization. Evaluated on four challenging domain-specific tasks and three small general-purpose LLMs, it consistently outperforms all baselines, delivering substantial performance gains while maintaining controlled degradation.
📝 Abstract
Prompt engineering has proven to be a crucial step in leveraging pretrained large language models (LLMs) in solving various real-world tasks. Numerous solutions have been proposed that seek to automate prompt engineering by using the model itself to edit prompts. However, the majority of state-of-the-art approaches are evaluated on tasks that require minimal prompt templates and on very large and highly capable LLMs. In contrast, solving complex tasks that require detailed information to be included in the prompt increases the amount of text that needs to be optimised. Furthermore, smaller models have been shown to be more sensitive to prompt design. To address these challenges, we propose an evolutionary search approach to automated discrete prompt optimisation consisting of two phases. In the first phase, grammar-guided genetic programming is invoked to synthesise prompt-creating programmes by searching the space of programmes populated by function compositions of syntactic, dictionary-based and LLM-based prompt-editing functions. In the second phase, local search is applied to explore the neighbourhoods of best-performing programmes in an attempt to further fine-tune their performance. Our approach outperforms three state-of-the-art prompt optimisation approaches, PromptWizard, OPRO, and RL-Prompt, on three relatively small general-purpose LLMs in four domain-specific challenging tasks. We also illustrate several examples where these benchmark methods suffer relatively severe performance degradation, while our approach improves performance in almost all task-model combinations, only incurring minimal degradation when it does not.