iPOE: Interpretable Prompt Optimization via Explanations

📅 2026-05-18

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This work addresses the lack of transparency in conventional prompt optimization methods, which often fail to explain the reasons behind performance improvements. To enhance interpretability, the study introduces a novel, explainable prompt optimization framework inspired by human annotation guidelines. The approach constructs structured prompts for non-expert users by distilling actionable guidelines from either human- or automatically generated decision rationales and then systematically applying operations such as deletion, addition, reordering, and merging. Experiments across four benchmark datasets demonstrate that this method achieves up to 31% and 35% performance gains over prompts without guidelines and those with randomly generated guidelines, respectively. Furthermore, the results validate that automatically generated explanations can effectively substitute for human-provided ones, offering a favorable balance between performance and interpretability.

📝 Abstract

Prompt optimization has often been framed as a discrete search problem to find high-performing and robust instructions for an LLM. However, the search result might not make it transparent why and where specific prompt changes lead to performance gains. This is in contrast to how humans are instructed for annotation tasks. Here, researchers carefully design annotation guidelines, leading to enhanced annotation consistency. Our paper aims at joining these two approaches and introduces iPOE, a novel interpretable prompt optimization strategy via explanations. We guide the prompt optimization process by automatically created guidelines from explanations of annotation decisions (either automatically generated or from humans). This set of guidelines is furthermore optimized by as series of operations, including removing, adding, shuffling, and merging. The resulting prompt includes guidelines that instruct the annotation, making the decision process of the LLM and the optimization transparent. It therefore supports also laypeople in the area of prompt optimization, particularly in challenging domains requiring expertise. In our experiments on four datasets, we find that iPOE can improves over prompts without guidelines and with random selected guidelines by up to $31\%$ and $35\%$, respectively. Moreover, LLM explanations can replace human explanations in the proposed method.

Problem

Research questions and friction points this paper is trying to address.

prompt optimization

interpretability

explanations

annotation guidelines

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

interpretable prompt optimization

explanation-guided prompting

annotation guidelines