Easier to Judge than to Find: Predicting In-Context Learning Success for Demonstration Selection

๐Ÿ“… 2026-05-18
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

215K/year
๐Ÿค– AI Summary
This work addresses the high sensitivity of in-context learning to prompt examples and the prohibitive cost of searching for optimal example combinations. To this end, the authors propose DiSP, a novel framework that advances a โ€œjudgment over searchโ€ paradigm: queries are stratified by difficulty through sampling and discrimination strategies, a lightweight router is trained to predict query difficulty, and dedicated discriminators are assigned to each difficulty tier. During inference, DiSP employs a budget-aware โ€œaccept-and-stopโ€ decision policy, outputting a risk-diagnosis label upon failure. Evaluated on five classification benchmarks, DiSP achieves up to a 3.4% absolute accuracy gain over strong baselines and accelerates end-to-end inference by up to 23ร—.
๐Ÿ“ Abstract
In-context learning (ICL) is highly sensitive to which demonstrations appear in the prompt, but selecting them is expensive because the space of possible demonstration contexts and combinations is enormous. We argue that demonstration selection is \emph{easier to judge than to find}: predicting whether a specific query--context pair $(q,D)$ will succeed is cheaper and more general than searching for an optimal $D^\star$. Based on this insight, we propose DiSP, a sample-and-judge framework that stratifies queries by difficulty. DiSP runs random demonstration trials to estimate success rate of each training query, trains a lightweight router to predict difficulty from the query, and trains level-specific judges for sampled demonstrations. At inference, DiSP performs stop-on-acceptance judging under an explicit budget, emitting diagnostic risk tags when no suitable context is found. Across five classification datasets with Llama~3--8B and Qwen~2.5--7B, DiSP achieves the best average accuracy, improving over strong learned selection baselines by up to 3.4\%, while achieving up to $23\times$ end-to-end wall-clock speedup.
Problem

Research questions and friction points this paper is trying to address.

in-context learning
demonstration selection
query-context pair
selection cost
combinatorial space
Innovation

Methods, ideas, or system contributions that make the work stand out.

in-context learning
demonstration selection
sample-and-judge
difficulty-aware routing
efficient inference
๐Ÿ”Ž Similar Papers
Haochun Wang
Haochun Wang
PhD, Harbin Institute of Technology
NLPLarge Language ModelAI4Science
C
Chaofen Yang
Research Center for Social Computing and Interactive Robotics, Harbin Institute of Technology, China
J
Jiatong Liu
Research Center for Social Computing and Interactive Robotics, Harbin Institute of Technology, China
J
Jingbo Wang
Research Center for Social Computing and Interactive Robotics, Harbin Institute of Technology, China
Z
Zewen Qiang
Research Center for Social Computing and Interactive Robotics, Harbin Institute of Technology, China
Sendong Zhao
Sendong Zhao
Harbin Institute of Technology
BioNLPLarge Language Model
Bing Qin
Bing Qin
Professor in Harbin Institute of Technology
Natural Language ProcessingInformation ExtractionSentiment Analysis
T
Ting Liu
Research Center for Social Computing and Interactive Robotics, Harbin Institute of Technology, China