Linear-Time Demonstration Selection for In-Context Learning via Gradient Estimation

📅 2025-08-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the problem of efficiently selecting the top-k in-context demonstration examples for few-shot learning. We propose the first linear-time demonstration selection algorithm based on gradient estimation in the input embedding space. Our method’s core innovation lies in applying a first-order Taylor approximation of the model output with respect to input embeddings to quantify the influence of each individual demonstration on the target prediction—without requiring full forward passes. By integrating randomized subset sampling with influence-score aggregation, we avoid exhaustive inference over all candidate demonstrations. Evaluated on six benchmark datasets, our approach achieves an average prediction error below 1%, accelerates inference by 37.7× over exhaustive search, and improves average accuracy by 11% over current state-of-the-art methods on a 34B-parameter language model.

Technology Category

Application Category

📝 Abstract
This paper introduces an algorithm to select demonstration examples for in-context learning of a query set. Given a set of $n$ examples, how can we quickly select $k$ out of $n$ to best serve as the conditioning for downstream inference? This problem has broad applications in prompt tuning and chain-of-thought reasoning. Since model weights remain fixed during in-context learning, previous work has sought to design methods based on the similarity of token embeddings. This work proposes a new approach based on gradients of the output taken in the input embedding space. Our approach estimates model outputs through a first-order approximation using the gradients. Then, we apply this estimation to multiple randomly sampled subsets. Finally, we aggregate the sampled subset outcomes to form an influence score for each demonstration, and select $k$ most relevant examples. This procedure only requires pre-computing model outputs and gradients once, resulting in a linear-time algorithm relative to model and training set sizes. Extensive experiments across various models and datasets validate the efficiency of our approach. We show that the gradient estimation procedure yields approximations of full inference with less than $mathbf{1}%$ error across six datasets. This allows us to scale up subset selection that would otherwise run full inference by up to $mathbf{37.7} imes$ on models with up to $34$ billion parameters, and outperform existing selection methods based on input embeddings by $mathbf{11}%$ on average.
Problem

Research questions and friction points this paper is trying to address.

Selecting k demonstration examples from n for in-context learning
Developing efficient algorithm for demonstration selection via gradient estimation
Improving prompt tuning and chain-of-thought reasoning through better example selection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Gradient-based demonstration selection algorithm
Linear-time complexity via pre-computation
First-order approximation for output estimation
🔎 Similar Papers
No similar papers found.