๐ค AI Summary
This work addresses two key challenges in large language model (LLM) prompt engineering: the difficulty of automatically generating long, effective prompts and incomplete semantic coverage. To tackle these, we propose a task-semantic facet learning paradigm that decomposes tasks into generalizable semantic dimensionsโsuch as counterexamples, explanations, and analogies. Based on this, we design UniPrompt, an algorithm integrating input-space clustering, batch-level semantic feedback, and decoupled modeling of prompt segments to enable iterative refinement (addition, deletion, modification). To our knowledge, UniPrompt is the first method enabling structured, automated generation of complex long prompts. Extensive experiments across multiple benchmarks and real-world tasks demonstrate its superiority over both human-optimized prompts and state-of-the-art baselines, achieving an average accuracy improvement of 8.2%. Moreover, UniPrompt exhibits strong cross-task generalization capability.
๐ Abstract
Given a task in the form of a basic description and its training examples, prompt optimization is the problem of synthesizing the given information into a text prompt for a large language model. Humans solve this problem by also considering the different facets that define a task (e.g., counter-examples, explanations, analogies) and including them in the prompt. However, it is unclear whether existing algorithmic approaches, based on iteratively editing a given prompt or automatically selecting a few in-context examples, can cover the multiple facets required to solve a complex task. In this work, we view prompt optimization as that of learning multiple facets of a task from a set of training examples. We exploit structure in the prompt optimization problem and break down a prompt into loosely coupled semantic sections. The proposed algorithm, UniPrompt, (1) clusters the input space and uses clustered batches so that each batch likely corresponds to a different facet of the task, and (2) utilizes a feedback mechanism to propose adding, editing or deleting a section, which in turn is aggregated over a batch to capture generalizable facets. Empirical evaluation on multiple datasets and a real-world task shows that prompts generated using shortname{} obtain higher accuracy than human-tuned prompts and those from state-of-the-art methods. In particular, our algorithm can generate long, complex prompts that existing methods are unable to generate. Code for UniPrompt is available at https://aka.ms/uniprompt.