Semi-Supervised In-Context Learning: A Baseline Study

📅 2025-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Prior work on in-context learning (ICL) predominantly relies on human-annotated demonstrations, overlooking the reliability and potential of self-generated labels. This paper introduces the first systematic semi-supervised ICL framework—IterPSD—comprising three stages: (1) large language model–driven self-labeling, (2) confidence-guided selection of high-quality pseudo-demonstrations, and (3) iterative pseudo-label refinement. We establish the first semi-supervised ICL baseline and uncover scaling laws under contexts containing up to thousands of demonstrations. Experiments show that Naive-SemiICL outperforms a 16-shot supervised baseline by 9.94% on average in low-data regimes; IterPSD further improves classification accuracy by up to 6.8%. Performance saturates with over 1,000 demonstrations. Our core contributions are: (i) formalizing semi-supervised ICL as a novel paradigm, (ii) proposing an iterative, optimization-aware pseudo-demonstration selection and refinement method, and (iii) empirically validating its substantial efficacy across diverse tasks.

Technology Category

Application Category

📝 Abstract
Most existing work in data selection for In-Context Learning (ICL) has focused on constructing demonstrations from ground truth annotations, with limited attention given to selecting reliable self-generated annotations. In this work, we propose a three-step semi-supervised ICL framework: annotation generation, demonstration selection, and semi-supervised inference. Our baseline, Naive-SemiICL, which prompts select high-confidence self-generated demonstrations for ICL prompting, outperforms a 16-shot baseline by an average of 9.94% across 16 datasets. We further introduce IterPSD, an annotation approach that refines pseudo-demonstrations iteratively, achieving up to 6.8% additional gains in classification tasks. Lastly, we reveal a scaling law for semi-supervised ICL, where models achieve optimal performance with over 1,000 demonstrations.
Problem

Research questions and friction points this paper is trying to address.

Improves In-Context Learning with self-generated annotations
Introduces IterPSD for refining pseudo-demonstrations iteratively
Reveals scaling law for optimal semi-supervised ICL performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Three-step semi-supervised ICL framework
Naive-SemiICL selects high-confidence self-generated demonstrations
IterPSD refines pseudo-demonstrations iteratively
🔎 Similar Papers
No similar papers found.