🤖 AI Summary
In-context learning (ICL) suffers from significant performance sensitivity to the ordering of demonstration examples, yet systematic optimization methods remain lacking. To address this, we propose OptiSeq—a gradient-free, fine-tuning-free few-shot example ordering framework. Its core contributions are threefold: (1) the first systematic empirical and theoretical analysis revealing the critical impact of example order on ICL efficacy; (2) a lightweight, LLM-output-based heuristic scoring mechanism leveraging log-probabilities of candidate completions; and (3) an efficient combinatorial search strategy incorporating pruning of the permutation space. Evaluated across multiple state-of-the-art LLMs and diverse benchmarks—including code generation and logical reasoning tasks—OptiSeq consistently improves accuracy by 6–10.5 percentage points. It substantially enhances both the robustness and upper-bound performance of ICL, establishing a new paradigm for efficient, scalable prompt engineering.
📝 Abstract
Developers using LLMs in their applications and agents have provided plenty of anecdotal evidence that in-context-learning (ICL) is fragile. In addition to the quantity and quality of examples, we show that the order in which the in-context examples are listed in the prompt affects the output of the LLM and, consequently, their performance. In this paper, we present OptiSeq, which introduces a score based on log probabilities of LLM outputs to prune the universe of possible example orderings in few-shot ICL and recommend the best order(s) by distinguishing between correct and incorrect outputs resulting from different order permutations. Through a detailed empirical evaluation on multiple LLMs, datasets and prompts, we demonstrate that OptiSeq improves accuracy by 6 - 10.5 percentage points across multiple tasks.