PromptWise: Online Learning for Cost-Aware Prompt Assignment in Generative Models

πŸ“… 2025-05-24
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This paper addresses the cost-effectiveness optimization of prompt allocation in multi-model generative AI services. Existing approaches largely ignore inter-model pricing disparities and prioritize performance alone. To bridge this gap, we propose the first cost-aware online learning framework for prompt allocation. Given sequentially arriving user prompts, our method employs a dynamic β€œlow-cost-first, progressive fallback” scheduling strategy. It jointly estimates task difficulty and model response quality in real time, enabling Pareto-optimal model selection via adaptive decision thresholds. The lightweight mechanism ensures low latency while maximizing cost efficiency. Experiments on puzzle solving, code generation, and code translation tasks demonstrate up to a 47% reduction in average service cost, alongside improved response satisfaction and higher system throughput.

Technology Category

Application Category

πŸ“ Abstract
The rapid advancement of generative AI models has provided users with numerous options to address their prompts. When selecting a generative AI model for a given prompt, users should consider not only the performance of the chosen model but also its associated service cost. The principle guiding such consideration is to select the least expensive model among the available satisfactory options. However, existing model-selection approaches typically prioritize performance, overlooking pricing differences between models. In this paper, we introduce PromptWise, an online learning framework designed to assign a sequence of prompts to a group of large language models (LLMs) in a cost-effective manner. PromptWise strategically queries cheaper models first, progressing to more expensive options only if the lower-cost models fail to adequately address a given prompt. Through numerical experiments, we demonstrate PromptWise's effectiveness across various tasks, including puzzles of varying complexity and code generation/translation tasks. The results highlight that PromptWise consistently outperforms cost-unaware baseline methods, emphasizing that directly assigning prompts to the most expensive models can lead to higher costs and potentially lower average performance.
Problem

Research questions and friction points this paper is trying to address.

Cost-aware prompt assignment for generative models
Balancing performance and service cost in LLM selection
Online learning to prioritize cheaper models effectively
Innovation

Methods, ideas, or system contributions that make the work stand out.

Online learning for cost-aware prompt assignment
Prioritizes cheaper models first for cost efficiency
Outperforms cost-unaware baseline methods significantly