🤖 AI Summary
This work addresses the high sensitivity of in-context learning performance in large language models to the order of demonstration examples, a challenge exacerbated by the infeasibility of exhaustive permutation search. The authors propose PLR, a method that reformulates discrete ordering as probabilistic modeling by leveraging the Plackett–Luce model to learn a distribution over example sequences. Model parameters are iteratively optimized using task-level evaluation metrics, and efficient sampling is enabled via Gumbel perturbations. Notably, PLR operates without requiring label confidence scores, making it applicable to unsupervised settings such as mathematical reasoning—thus overcoming limitations of conventional label-dependent approaches. Experiments demonstrate that PLR consistently improves few-shot accuracy across multiple classification benchmarks with 4–32 demonstrations and yields meaningful gains on mathematical reasoning tasks.
📝 Abstract
In-context learning (ICL) adapts large language models by conditioning on a small set of ICL examples, avoiding costly parameter updates. Among other factors, performance is often highly sensitive to the ordering of the examples. However, exhaustive search over the $n!$ possible orderings is infeasible. Therefore more efficient ordering methods use model confidence measures (e.g., label-probability entropy) over label sets or take a direct approach to finding the best ordering. We propose PLR, a probabilistic approach to in-context example ordering that replaces discrete ordering search with learning a probability distribution over orderings with the Plackett-Luce model. PLR models orderings using a Plackett-Luce distribution and iteratively updates its parameters to concentrate probability mass on high-performing orderings under a task-level metric. Candidate orderings are sampled efficiently via a Gumbel perturb-and-sort procedure. Experiments on multiple classification benchmarks show that PLR consistently improves few-shot accuracy for $k \in \{4, 8, 16, 32\}$ examples, and we further demonstrate gains on mathematical reasoning tasks where label-based ordering methods are not applicable. Our code is available at https://github.com/Batorskq/PLR.