Monte Carlo Sampling for Analyzing In-Context Examples

📅 2025-03-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study challenges the universality of empirical heuristics in in-context learning (ICL), such as “more examples are always better” and “one-shot is inherently superior to zero-shot,” by systematically investigating the joint effects of example quantity, ordering, and selection. Method: We propose the first Monte Carlo sampling framework that explicitly models the interaction between example selection and permutation, thereby mitigating attribution bias arising from isolating single factors. Contribution/Results: Experiments reveal that conventional quantity guidelines exhibit poor generalization across example sets; optimal example configurations are highly task- and model-dependent—even one-shot performance can degrade below zero-shot baselines. Moreover, data-value-based “robust” selection strategies introduce implicit optimization pitfalls, yielding lower accuracy than random sampling. These findings establish a new paradigm for interpretable ICL design and empirically grounded benchmarking.

Technology Category

Application Category

📝 Abstract
Prior works have shown that in-context learning is brittle to presentation factors such as the order, number, and choice of selected examples. However, ablation-based guidance on selecting the number of examples may ignore the interplay between different presentation factors. In this work we develop a Monte Carlo sampling-based method to study the impact of number of examples while explicitly accounting for effects from order and selected examples. We find that previous guidance on how many in-context examples to select does not always generalize across different sets of selected examples and orderings, and whether one-shot settings outperform zero-shot settings is highly dependent on the selected example. Additionally, inspired by data valuation, we apply our sampling method to in-context example selection to select examples that perform well across different orderings. We find a negative result, that while performance is robust to ordering and number of examples, there is an unexpected performance degradation compared to random sampling.
Problem

Research questions and friction points this paper is trying to address.

Analyzes impact of example count in in-context learning
Studies interplay between example order, selection, and quantity
Evaluates robustness of example selection across different orderings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Monte Carlo sampling for in-context example analysis
Evaluates example count, order, and selection interplay
Applies data valuation to select robust examples
🔎 Similar Papers
No similar papers found.
S
S. Schoch
Department of Computer Science, University of Virginia
Yangfeng Ji
Yangfeng Ji
Computer Science, University of Virginia
Natural Language ProcessingMachine Learning