Learn to Select: Exploring Label Distribution Divergence for In-Context Demonstration Selection in Text Classification

📅 2025-11-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing in-context learning for text classification lacks principled methods for selecting informative demonstration examples. Method: This paper proposes a two-stage demonstration selection framework: (1) semantic similarity-based retrieval to obtain candidate examples; (2) label distribution alignment—novelly defining and quantifying label distribution discrepancy between the test instance and candidates via KL divergence, using prediction distributions estimated by a fine-tuned small BERT model, followed by Top-K filtering to construct the optimal demonstration set. The approach is parameter- and gradient-free with respect to large language models, ensuring efficiency and lightweight deployment. Contribution/Results: Our method significantly outperforms state-of-the-art demonstration selection strategies across seven standard text classification benchmarks. Empirical analysis further reveals a strong positive correlation between large-model performance and the label prediction accuracy of the small BERT model, substantiating the efficacy of label distribution alignment as a proxy for demonstration quality.

Technology Category

Application Category

📝 Abstract
In-context learning (ICL) for text classification, which uses a few input-label demonstrations to describe a task, has demonstrated impressive performance on large language models (LLMs). However, the selection of in-context demonstrations plays a crucial role and can significantly affect LLMs'performance. Most existing demonstration selection methods primarily focus on semantic similarity between test inputs and demonstrations, often overlooking the importance of label distribution alignment. To address this limitation, we propose a two-stage demonstration selection method, TopK + Label Distribution Divergence (L2D), which leverages a fine-tuned BERT-like small language model (SLM) to generate label distributions and calculate their divergence for both test inputs and candidate demonstrations. This enables the selection of demonstrations that are not only semantically similar but also aligned in label distribution with the test input. Extensive experiments across seven text classification benchmarks show that our method consistently outperforms previous demonstration selection strategies. Further analysis reveals a positive correlation between the performance of LLMs and the accuracy of the underlying SLMs used for label distribution estimation.
Problem

Research questions and friction points this paper is trying to address.

Addresses label distribution divergence in demonstration selection for text classification
Proposes two-stage method combining semantic similarity and label distribution alignment
Improves in-context learning performance by selecting semantically and distributionally aligned demonstrations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage demonstration selection using TopK and LDD
BERT-like SLM generates label distributions for alignment
Selects semantically similar and label-aligned demonstrations
🔎 Similar Papers
No similar papers found.
Y
Ye Jiang
College of Information Science and Technology, Qingdao University of Science and Technology
T
Taihang Wang
College of Information Science and Technology, Qingdao University of Science and Technology
Y
Youzheng Liu
College of Information Science and Technology, Qingdao University of Science and Technology
Y
Yimin Wang
College of Data Science, Qingdao University of Science and Technology
Y
Yuhan Xia
School of Electronic Engineering and Computer Science, Queen Mary University of London
Yunfei Long
Yunfei Long
Michigan State University
Computer VisionSensor Fusion