Better by Comparison: Retrieval-Augmented Contrastive Reasoning for Automatic Prompt Optimization

📅 2025-09-02

📈 Citations: 0

✨ Influential: 0

career value

145K/year

🤖 AI Summary

Existing automatic prompt optimization methods primarily focus on direct prompt refinement or model fine-tuning, overlooking large language models’ (LLMs) inherent capacity for reasoning-based learning via comparative examples. This paper proposes the **Contrastive Reasoning Prompt Optimization (CRPO)** framework—a retrieval-augmented approach that, for the first time, formalizes prompt optimization as a hierarchical, multi-dimensional contrastive reasoning process. CRPO constructs a high-quality prompt retrieval library based on HelpSteer2 and performs reflective optimization by analyzing discrepancies between high- and low-quality prompts across dimensions such as helpfulness, correctness, and coherence. The framework ensures both interpretability and robustness. Experimental results demonstrate that CRPO significantly outperforms state-of-the-art baselines on the HelpSteer2 benchmark, validating the effectiveness of synergistically integrating contrastive reasoning with retrieval augmentation to enhance prompt generation quality.

Technology Category

Application Category

📝 Abstract

Automatic prompt optimization has recently emerged as a strategy for improving the quality of prompts used in Large Language Models (LLMs), with the goal of generating more accurate and useful responses. However, most prior work focuses on direct prompt refinement or model fine-tuning, overlooking the potential of leveraging LLMs' inherent reasoning capability to learn from contrasting examples. In this paper, we present Contrastive Reasoning Prompt Optimization (CRPO), a novel framework that formulates prompt optimization as a retrieval augmented reasoning process. Our approach retrieves top k reference prompts from the HelpSteer2 dataset, an open-source collection annotated for helpfulness, correctness, coherence, complexity, and verbosity, and constructs two complementary optimization paradigms: (1) tiered contrastive reasoning, where the LLM compares high, medium, and low quality prompts to refine its own generation through reflective reasoning, and (2) multi-metric contrastive reasoning, where the LLM analyzes the best prompts along each evaluation dimension and integrates their strengths into an optimized prompt. By explicitly contrasting high and low quality exemplars, CRPO enables the model to deduce why certain prompts succeed while others fail, thereby achieving more robust and interpretable optimization. Experimental results on the HelpSteer2 benchmark demonstrate that CRPO significantly outperforms baselines. Our findings highlight the promise of contrastive, retrieval-augmented reasoning for advancing automatic prompt optimization.

Problem

Research questions and friction points this paper is trying to address.

Optimizing prompts using retrieval-augmented contrastive reasoning

Learning from contrasting high and low quality prompt examples

Improving prompt quality through tiered and multi-metric analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieves top prompts from HelpSteer2 dataset

Uses tiered contrastive reasoning for refinement

Applies multi-metric reasoning across dimensions

🔎 Similar Papers

No similar papers found.