KITE: Kernelized and Information Theoretic Exemplars for In-Context Learning

📅 2025-09-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conventional in-context learning (ICL) suffers from low-efficiency example selection, poor generalization, and insufficient diversity. Method: We propose the first information-theoretic, query-adaptive example selection framework. It models the information gain of candidate examples, formulates a near-submodular objective function, and incorporates kernel tricks to handle high-dimensional embedding spaces alongside optimal experimental design regularization to enhance structural awareness and diversity. The method optimizes a query-specific objective—linear reconstruction error minimization—via an efficient greedy algorithm. Contribution/Results: Evaluated on multiple low-resource classification tasks, our framework significantly outperforms standard nearest-neighbor retrieval baselines. It demonstrates superior accuracy, generalization, and example diversity under data-scarce conditions, validating its effectiveness in balancing these critical properties for ICL.

Technology Category

Application Category

📝 Abstract
In-context learning (ICL) has emerged as a powerful paradigm for adapting large language models (LLMs) to new and data-scarce tasks using only a few carefully selected task-specific examples presented in the prompt. However, given the limited context size of LLMs, a fundamental question arises: Which examples should be selected to maximize performance on a given user query? While nearest-neighbor-based methods like KATE have been widely adopted for this purpose, they suffer from well-known drawbacks in high-dimensional embedding spaces, including poor generalization and a lack of diversity. In this work, we study this problem of example selection in ICL from a principled, information theory-driven perspective. We first model an LLM as a linear function over input embeddings and frame the example selection task as a query-specific optimization problem: selecting a subset of exemplars from a larger example bank that minimizes the prediction error on a specific query. This formulation departs from traditional generalization-focused learning theoretic approaches by targeting accurate prediction for a specific query instance. We derive a principled surrogate objective that is approximately submodular, enabling the use of a greedy algorithm with an approximation guarantee. We further enhance our method by (i) incorporating the kernel trick to operate in high-dimensional feature spaces without explicit mappings, and (ii) introducing an optimal design-based regularizer to encourage diversity in the selected examples. Empirically, we demonstrate significant improvements over standard retrieval methods across a suite of classification tasks, highlighting the benefits of structure-aware, diverse example selection for ICL in real-world, label-scarce scenarios.
Problem

Research questions and friction points this paper is trying to address.

Selecting optimal examples for in-context learning
Overcoming limitations of nearest-neighbor methods
Maximizing prediction accuracy for specific queries
Innovation

Methods, ideas, or system contributions that make the work stand out.

Kernel trick for high-dimensional feature spaces
Information theory-driven submodular optimization
Optimal design regularizer for diversity
🔎 Similar Papers
No similar papers found.