GPS: General Per-Sample Prompter

📅 2025-11-18

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Large language models (LLMs) exhibit high sensitivity to prompting, and manually designing effective prompts remains challenging. Existing automated prompting methods face three key bottlenecks: reliance on large task-specific datasets, computationally expensive optimization, and inability to adapt prompts at the sample level. To address these limitations, we propose GPS—a general, sample-wise prompt generation framework that requires no task-specific training or optimization. GPS employs a reinforcement learning–trained prompter augmented with a novel regularization constraint and minimum Bayes risk decoding to enable input-adaptive prompt generation. Evaluated in zero-shot settings, GPS significantly improves both generalization and robustness across diverse tasks—including text simplification, summarization, classification, and mathematical reasoning (GSM8K)—consistently outperforming strong baselines. Notably, on GSM8K, GPS achieves state-of-the-art performance.

Technology Category

Application Category

📝 Abstract

LLMs are sensitive to prompting, with task performance often hinging on subtle, sometimes imperceptible variations in phrasing. As a result, crafting effective prompts manually remains challenging and time-consuming. Recent automatic prompting methods mitigate this difficulty but face three key limitations: (i) for each new task, they require large datasets to train good prompts;(ii) they rely on costly optimization loops that may take hours; (iii)they typically produce a single task-level prompt that does not adapt to the individual input problem to be solved. We propose GPS, the first general-purpose, per-sample prompting method. Without any task-specific tuning, GPS generates a tailored prompt for each unseen input, improving performance across diverse tasks. The prompter is trained with reinforcement learning on a suite of training tasks and includes a novel regularization for effectively adapting to per-sample prompting. Finally, we employ Minimum Bayes Risk decoding to stabilize inference. Empirically, GPS demonstrates competitive performance: we attain second best results among baselines on text simplification, third best results on summarization and on-par results on classification, while not training on any of these tasks, in contrast to the baselines. For in-domain prompting, we obtain sota on GSM8K. Our work shows the potential of a novel and effective paradigm for automatic prompting: generating adaptive, input-specific prompts without extensive optimization and without access to a task-specific training set. Our code is available at https://github.com/Batorskq/GPS.

Problem

Research questions and friction points this paper is trying to address.

Automates per-sample prompt generation for LLMs

Eliminates need for task-specific training data

Reduces optimization time compared to existing methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates tailored prompts per individual input

Uses reinforcement learning with novel regularization

Employs Minimum Bayes Risk decoding for stable inference

🔎 Similar Papers

No similar papers found.