🤖 AI Summary
Example selection in in-context learning (ICL) is highly sensitive, yet existing methods lack a unified optimization objective and suffer from fragmented theoretical foundations.
Method: We propose the first unified, representation-based evaluation metric for ICL—comprising two computable, downstream-accuracy–correlated dimensions: *affinity* (semantic similarity between examples and the query) and *diversity* (representational dissimilarity among examples). Our approach extracts intermediate-layer representations from pre-trained language models, constructs a similarity matrix and a diversity measure, and jointly optimizes the example set accordingly.
Contribution/Results: Across multiple benchmarks, our metric exhibits strong correlation with task accuracy (average Spearman ρ > 0.85). Moreover, it provides the first unified explanatory framework that reproduces and interprets the efficacy of prominent selection strategies—including KNN, BERT-KNN, and Self-Adaptive—thereby establishing an interpretable, generalizable theoretical foundation for ICL example selection.
📝 Abstract
The performance of In-Context Learning (ICL) is highly sensitive to the selected demonstrations. Existing approaches to demonstration selection optimize different objectives, yielding inconsistent results. To address this, we propose a unified metric--affinity and diversity--that leverages ICL model's internal representations. Our experiments show that both affinity and diversity strongly correlate with test accuracies, indicating their effectiveness for demonstration selection. Moreover, we show that our proposed metrics align well with various previous works to unify the inconsistency.