Inference-Aware Prompt Optimization for Aligning Black-Box Large Language Models

📅 2025-08-08

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Existing prompt optimization methods neglect the strong coupling between prompt design and inference strategies deployed at runtime (e.g., Best-of-N, Majority Voting), while users’ preferences in trading off multi-objective performance against inference budget critically influence optimal configuration selection. To address this, we propose IAPO, an Inference-Aware Prompt Optimization framework that jointly optimizes prompts and inference scale for the first time—enabling co-optimization under multi-objective trade-offs and budget constraints. Its core innovation is the Prompt Scaling via Sequence Pruning (PSST) algorithm, which adapts to mainstream inference strategies and achieves controllable error-probability reduction within bounded computational budgets. Extensive experiments across six text generation and reasoning tasks demonstrate IAPO’s effectiveness: it significantly improves alignment with black-box large language models, validating both the necessity and superiority of inference-aware joint optimization.

Technology Category

Application Category

📝 Abstract

Prompt optimization methods have demonstrated significant effectiveness in aligning black-box large language models (LLMs). In parallel, inference scaling strategies such as Best-of-N Sampling and Majority Voting have also proven to enhance alignment and performance by trading off computation. However, existing prompt optimization approaches are inference strategy agnostic; that is, they optimize prompts without regard to the inference strategy employed during deployment. This constitutes a significant methodological gap, as our empirical and theoretical analysis reveals a strong interdependence between these two paradigms. Moreover, we find that user preferences regarding trade-offs among multiple objectives and inference budgets substantially influence the choice of prompt and inference configuration. To address this gap, we introduce a unified novel framework named IAPO (Inference-Aware Prompt Optimization) that jointly optimizes the prompt and inference scale, while being aware of the inference budget and different task objectives. We then develop a fixed-budget training algorithm for IAPO, which we call PSST (Prompt Scaling via Sequential Trimming), and analyze finite-budget guarantees on error probability. Finally, we evaluate the effectiveness of PSST on six different tasks, including multi-objective text generation and reasoning, and demonstrate the critical role of incorporating inference-awareness when aligning black-box LLMs through prompt optimization.

Problem

Research questions and friction points this paper is trying to address.

Optimizing prompts without considering inference strategies creates a gap.

User preferences impact prompt and inference configuration choices.

Proposing IAPO to jointly optimize prompt and inference scale.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Jointly optimizes prompt and inference scale

Aware of inference budget and objectives

Fixed-budget training algorithm PSST

🔎 Similar Papers

No similar papers found.