Prompt Codebooks: Discrete Compositional Optimization for Language Model Instruction Refinement

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This work addresses the limitations of existing automatic prompt optimization methods, which treat prompts as monolithic strings and thus struggle to model reusable sub-behaviors, resulting in fragile updates and poor adaptability across inputs. To overcome this, the authors propose Prompt Codebooks (PCO), a novel framework that introduces, for the first time, a discrete codebook-based compositional prompt optimization mechanism. PCO reformulates prompt construction as dynamic selection and composition from a finite set of natural language “instinct” units. It employs an LLM-driven encoder–generator–critic architecture to jointly train the codebook and routing policy while keeping the target model frozen, leveraging a linguistic value minimax objective and textual gradient decomposition to enable instance-specific customization. Evaluated on Qwen3-8B and LLaMA-3.1-8B, PCO achieves gains up to 30.36 points across six benchmarks over the strongest baseline, GEPA, while compressing prompt length to 1/14.1 of MIPROv2 and 1/3.0 of GEPA using only 16 atomic units.

📝 Abstract

Automatic prompt optimization (APO) has driven significant gains in LLM-based agentic workflows. However, existing methods treat each task's prompt as a monolithic, instance-blind string optimized through global edits, producing brittle updates and preventing the reuse of learned sub-behaviors. We propose Prompt Codebooks (PCO), a novel compositional prompt optimization framework that recasts APO as discrete learning over a finite vocabulary of natural-language instincts - atomic, reusable instruction units. PCO organizes prompt-construction knowledge in a discrete codebook and routes each input to a small subset of entries via an LLM-based encoder; a generator composes them into a prompt for the frozen target model; a critic emits a structured verdict that decomposes by attribution into per-variable textual gradients, jointly training the encoder, generator, and codebook under a language-valued min-max objective. The resulting routing is per-instance: different inputs in the same task receive different instinct compositions, a regime structurally inexpressible under instance-blind methods. Across six benchmarks on Qwen3-8B and LLaMA-3.1-8B, PCO improves over zero-shot by up to +30.36 points, surpasses the strongest prior baseline (GEPA) by +3.34 on HotpotQA and +1.11 in aggregate, and reduces deployed prompt length by up to 14.1x versus MIPROv2 and 3.0x versus GEPA using only K=16 instincts.

Problem

Research questions and friction points this paper is trying to address.

prompt optimization

compositional learning

language models

instruction refinement

discrete optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Prompt Codebooks

Compositional Prompt Optimization

Discrete Learning