PMPO: Probabilistic Metric Prompt Optimization for Small and Large Language Models

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing prompt optimization methods rely on costly sampling, manual annotation, or model-based self-evaluation, exhibiting poor scalability—particularly for small-scale or non-instruction-tuned models. This paper proposes a lightweight, sampling-free prompt optimization framework that requires only forward passes and token-level cross-entropy loss, eliminating the need for output generation, human annotation, or self-critique. It unifies support for both supervised learning and preference optimization tasks. By performing mask sensitivity analysis, the method identifies low-quality prompt segments and refines them via loss-driven rewriting and filtering. Experiments demonstrate state-of-the-art average accuracy on BBH, substantial improvements on GSM8K and AQUA-RAT, and a >19-percentage-point win-rate gain on AlpacaEval 2.0. These results validate the framework’s efficiency and strong generalization across diverse model scales and task types.

Technology Category

Application Category

📝 Abstract
Prompt optimization offers a practical and broadly applicable alternative to fine-tuning for improving large language model (LLM) performance. However, existing methods often rely on costly output generation, self-critiquing abilities, or human-annotated preferences, which limit their scalability, especially for smaller or non-instruction-tuned models. We introduce PMPO (Probabilistic Metric Prompt Optimization), a unified framework that refines prompts using token-level cross-entropy loss as a direct, lightweight evaluation signal. PMPO identifies low-quality prompt segments by masking and measuring their impact on loss, then rewrites and selects improved variants by minimizing loss over positive and negative examples. Unlike prior methods, it requires no output sampling or human evaluation during optimization, relying only on forward passes and log-likelihoods. PMPO supports both supervised and preference-based tasks through a closely aligned loss-based evaluation strategy. Experiments show that PMPO consistently outperforms prior methods across model sizes and tasks: it achieves the highest average accuracy on BBH, performs strongly on GSM8K and AQUA-RAT, and improves AlpacaEval 2.0 win rates by over 19 points. These results highlight PMPO's effectiveness, efficiency, and broad applicability.
Problem

Research questions and friction points this paper is trying to address.

Optimizing prompts without costly output generation or human input
Improving language model performance across sizes and tasks efficiently
Using token-level loss to refine prompts without sampling or evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses token-level cross-entropy loss for evaluation
Optimizes prompts without output sampling or human input
Supports both supervised and preference-based tasks
🔎 Similar Papers
No similar papers found.
C
Chenzhuo Zhao
Peking University
Z
Ziqian Liu
Unaffiliated
X
Xingda Wang
Peking University
Junting Lu
Junting Lu
Peking University
Multimodal Agent
Chaoyi Ruan
Chaoyi Ruan
National University of Singapore