🤖 AI Summary
To address the bottleneck of prompt engineering—its heavy reliance on human intuition and inability to capture subtle semantic cues—this paper proposes the first end-to-end, few-shot prompt generation framework based on reinforcement learning. Our method integrates Proximal Policy Optimization (PPO) for policy refinement, LLM-based self-feedback reward modeling, and semantic consistency constraints to jointly optimize prompts and their corresponding responses, enabling synthesis of high-quality, novel in-context examples outside the training distribution. Empirical evaluation on text classification, simplification, and summarization tasks demonstrates substantial improvements over state-of-the-art baselines APE and EvoPrompt: up to +2.58% in classification accuracy, +4.32 in ROUGE-L, and +6.93 in SARI. This work overcomes the generalization limitations of existing gradient-based and evolutionary approaches, establishing a new paradigm for automated, generalizable prompt engineering.
📝 Abstract
Effective prompt engineering remains a central challenge in fully harnessing the capabilities of LLMs. While well-designed prompts can dramatically enhance performance, crafting them typically demands expert intuition and a nuanced understanding of the task. Moreover, the most impactful prompts often hinge on subtle semantic cues, ones that may elude human perception but are crucial for guiding LLM behavior. In this paper, we introduce PRL (Prompts from Reinforcement Learning), a novel RL-based approach for automatic prompt generation. Unlike previous methods, PRL can produce novel few-shot examples that were not seen during training. Our approach achieves state-of-the-art performance across a range of benchmarks, including text classification, simplification, and summarization. On the classification task, it surpasses prior methods by 2.58% over APE and 1.00% over EvoPrompt. Additionally, it improves the average ROUGE scores on the summarization task by 4.32 over APE and by 2.12 over EvoPrompt and the SARI score on simplification by 6.93 over APE and by 6.01 over EvoPrompt. Our code is available at https://github.com/Batorskq/prl .