Rethinking Label Consistency of In-Context Learning: An Implicit Transductive Label Propagation Perspective

📅 2025-12-12

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

Existing in-context learning (ICL) methods predominantly rely on semantic similarity to retrieve top-K exemplars, yet this often yields label-inconsistent demonstrations that impair generalization. We identify this issue as an implicit transductive label propagation problem and, for the first time, formulate ICL from a Bayesian perspective—jointly modeling concept-guided retrieval and label estimation under an error-bounded label propagation framework. Based on this formulation, we propose TopK-SD, a label-consistency-driven sampling method that jointly optimizes semantic similarity and label distribution modeling via synthetic data augmentation. Evaluated across multiple NLP benchmarks, TopK-SD consistently outperforms standard top-K retrieval, empirically validating the critical role of label consistency in ICL performance. Our work establishes a novel analytical paradigm for understanding the intrinsic mechanisms of ICL, bridging conceptual grounding with reliable label inference.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) perform in-context learning (ICL) with minimal supervised examples, which benefits various natural language processing (NLP) tasks. One of the critical research focus is the selection of prompt demonstrations. Current approaches typically employ retrieval models to select the top-K most semantically similar examples as demonstrations. However, we argue that existing methods are limited since the label consistency is not guaranteed during demonstration selection. Our cognition derives from the Bayesian view of ICL and our rethinking of ICL from the transductive label propagation perspective. We treat ICL as a transductive learning method and incorporate latent concepts from Bayesian view and deduce that similar demonstrations guide the concepts of query, with consistent labels serving as estimates. Based on this understanding, we establish a label propagation framework to link label consistency with propagation error bounds. To model label consistency, we propose a data synthesis method, leveraging both semantic and label information, and use TopK sampling with Synthetic Data (TopK-SD) to acquire demonstrations with consistent labels. TopK-SD outperforms original TopK sampling on multiple benchmarks. Our work provides a new perspective for understanding the working mechanisms within ICL.

Problem

Research questions and friction points this paper is trying to address.

Ensuring label consistency in demonstration selection for in-context learning

Addressing limitations of semantic similarity-based prompt retrieval methods

Proposing a transductive label propagation framework to improve ICL performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

TopK sampling with Synthetic Data for demonstrations

Label propagation framework for consistency error bounds

Semantic and label information synthesis method

🔎 Similar Papers

Disentangling Latent Shifts of In-Context Learning Through Self-Training