🤖 AI Summary
Existing automatic prompt optimization methods primarily focus on direct prompt refinement or model fine-tuning, overlooking large language models’ (LLMs) inherent capacity for reasoning-based learning via comparative examples. This paper proposes the **Contrastive Reasoning Prompt Optimization (CRPO)** framework—a retrieval-augmented approach that, for the first time, formalizes prompt optimization as a hierarchical, multi-dimensional contrastive reasoning process. CRPO constructs a high-quality prompt retrieval library based on HelpSteer2 and performs reflective optimization by analyzing discrepancies between high- and low-quality prompts across dimensions such as helpfulness, correctness, and coherence. The framework ensures both interpretability and robustness. Experimental results demonstrate that CRPO significantly outperforms state-of-the-art baselines on the HelpSteer2 benchmark, validating the effectiveness of synergistically integrating contrastive reasoning with retrieval augmentation to enhance prompt generation quality.
📝 Abstract
Automatic prompt optimization has recently emerged as a strategy for improving the quality of prompts used in Large Language Models (LLMs), with the goal of generating more accurate and useful responses. However, most prior work focuses on direct prompt refinement or model fine-tuning, overlooking the potential of leveraging LLMs' inherent reasoning capability to learn from contrasting examples. In this paper, we present Contrastive Reasoning Prompt Optimization (CRPO), a novel framework that formulates prompt optimization as a retrieval augmented reasoning process. Our approach retrieves top k reference prompts from the HelpSteer2 dataset, an open-source collection annotated for helpfulness, correctness, coherence, complexity, and verbosity, and constructs two complementary optimization paradigms: (1) tiered contrastive reasoning, where the LLM compares high, medium, and low quality prompts to refine its own generation through reflective reasoning, and (2) multi-metric contrastive reasoning, where the LLM analyzes the best prompts along each evaluation dimension and integrates their strengths into an optimized prompt. By explicitly contrasting high and low quality exemplars, CRPO enables the model to deduce why certain prompts succeed while others fail, thereby achieving more robust and interpretable optimization. Experimental results on the HelpSteer2 benchmark demonstrate that CRPO significantly outperforms baselines. Our findings highlight the promise of contrastive, retrieval-augmented reasoning for advancing automatic prompt optimization.