QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning

📅 2024-08-20

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Existing prompt optimization methods neglect query dependency and rely heavily on frequent online LLM interactions, resulting in suboptimal performance and high computational overhead. To address this, we propose a multi-cycle offline reinforcement learning framework—the first to establish a query-dependent prompt optimization paradigm. Leveraging existing prompt demonstration data, our method iteratively enhances training data within a fully offline closed loop via three key components: prompt distillation, lightweight fine-tuning of small surrogate models, and dynamic data augmentation—eliminating the need for any online LLM queries. Evaluated across diverse NLP and mathematical reasoning tasks, our approach significantly improves zero-shot and few-shot performance of LLMs ranging from 7B to 70B parameters. Crucially, it reduces the computational cost of prompt optimization by up to an order of magnitude compared to online baselines, enabling scalable, efficient, and query-adaptive prompt engineering.

Technology Category

Application Category

📝 Abstract

Prompt engineering has demonstrated remarkable success in enhancing the performance of large language models (LLMs) across diverse tasks. However, most existing prompt optimization methods only focus on the task-level performance, overlooking the importance of query-preferred prompts, which leads to suboptimal performances. Additionally, these methods rely heavily on frequent interactions with LLMs to obtain feedback for guiding the optimization process, incurring substantial redundant interaction costs. In this paper, we introduce Query-dependent Prompt Optimization (QPO), which leverages multi-loop offline reinforcement learning to iteratively fine-tune a small pretrained language model to generate optimal prompts tailored to the input queries, thus significantly improving the prompting effect on the large target LLM. We derive insights from offline prompting demonstration data, which already exists in large quantities as a by-product of benchmarking diverse prompts on open-sourced tasks, thereby circumventing the expenses of online interactions. Furthermore, we continuously augment the offline dataset with the generated prompts in each loop, as the prompts from the fine-tuned model are supposed to outperform the source prompts in the original dataset. These iterative loops bootstrap the model towards generating optimal prompts. Experiments on various LLM scales and diverse NLP and math tasks demonstrate the efficacy and cost-efficiency of our method in both zero-shot and few-shot scenarios.

Problem

Research questions and friction points this paper is trying to address.

Optimizing prompts for query-specific performance in LLMs

Reducing interaction costs with offline reinforcement learning

Enhancing prompt effectiveness without online LLM feedback

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-loop offline reinforcement learning for prompts

Query-preferred prompts via fine-tuned small model

Offline dataset augmentation with generated prompts

🔎 Similar Papers

No similar papers found.