🤖 AI Summary
Existing prompt optimization methods rely on LLM-driven stochastic rewriting, suffering from premature convergence to local optima, unstable performance, and poor cross-task transferability. This paper proposes DelvePO, a task-agnostic prompt optimization framework based on self-evolution. Its core innovation is a novel direction-guided self-evolution mechanism: it decouples prompt components to enable interpretable analysis of influencing factors, and incorporates a working memory module to mitigate model uncertainty—thereby enhancing optimization stability and generalization. DelvePO is compatible with both open-source (e.g., Llama, Qwen) and closed-source (e.g., GPT series) LLMs. Extensive experiments across diverse domains demonstrate significant improvements over state-of-the-art methods. Empirical validation on DeepSeek, Qwen2.5, and GPT-4o-mini confirms its strong effectiveness and broad transferability across architectures and tasks.
📝 Abstract
Prompt Optimization has emerged as a crucial approach due to its capabilities in steering Large Language Models to solve various tasks. However, current works mainly rely on the random rewriting ability of LLMs, and the optimization process generally focus on specific influencing factors, which makes it easy to fall into local optimum. Besides, the performance of the optimized prompt is often unstable, which limits its transferability in different tasks. To address the above challenges, we propose $ extbf{DelvePO}$ ($ extbf{D}$irection-Guid$ extbf{e}$d Se$ extbf{l}$f-E$ extbf{v}$olving Framework for Fl$ extbf{e}$xible $ extbf{P}$rompt $ extbf{O}$ptimization), a task-agnostic framework to optimize prompts in self-evolve manner. In our framework, we decouple prompts into different components that can be used to explore the impact that different factors may have on various tasks. On this basis, we introduce working memory, through which LLMs can alleviate the deficiencies caused by their own uncertainties and further obtain key insights to guide the generation of new prompts. Extensive experiments conducted on different tasks covering various domains for both open- and closed-source LLMs, including DeepSeek-R1-Distill-Llama-8B, Qwen2.5-7B-Instruct and GPT-4o-mini. Experimental results show that DelvePO consistently outperforms previous SOTA methods under identical experimental settings, demonstrating its effectiveness and transferability across different tasks.