In-Context Learning on a Budget: A Case Study in Token Classification

📅 2024-06-19

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

This work addresses few-shot in-context learning (ICL) optimization under extremely limited annotation budgets, focusing on high-cost, low-resource token-level classification tasks. We propose a realistic ICL paradigm where annotation budget serves as a hard constraint, and systematically evaluate token-level annotation efficiency of active learning strategies—including uncertainty, diversity, and gradient-based sampling—against random sampling. Experiments span multiple tasks, models, and datasets. Results show that merely 50–200 carefully selected annotated tokens achieve over 90% of full fine-tuning performance; moreover, random sampling performs comparably to sophisticated active learning methods across most settings, with no statistically significant difference. Our key contribution is uncovering the diminishing marginal returns of small-scale, high-quality annotations and challenging the conventional assumption that “smarter” sampling inherently yields superior performance. This establishes a simple, robust, and empirically grounded benchmark for efficient annotation in data-scarce scenarios.

Technology Category

Application Category

📝 Abstract

Few shot in-context learning (ICL) typically assumes access to large annotated training sets. However, in many real world scenarios, such as domain adaptation, there is only a limited budget to annotate a small number of samples, with the goal of maximizing downstream performance. We study various methods for selecting samples to annotate within a predefined budget, focusing on token classification tasks, which are expensive to annotate and are relatively less studied in ICL setups. Across various tasks, models, and datasets, we observe that no method significantly outperforms the others, with most yielding similar results, including random sample selection for annotation. Moreover, we demonstrate that a relatively small annotated sample pool can achieve performance comparable to using the entire training set. We hope that future work adopts our realistic paradigm which takes annotation budget into account.

Problem

Research questions and friction points this paper is trying to address.

Budget-Constrained Learning

Data Selection

Classification Tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Budget-Constrained Learning

Efficient Data Annotation

Performance Optimization

🔎 Similar Papers

Token-based Decision Criteria Are Suboptimal in In-context Learning