🤖 AI Summary
This paper addresses the challenge of adapting large language models (LLMs) to new tasks with extremely limited labeled examples. We propose and systematically evaluate *context distillation*—a method that internalizes few-shot task demonstrations into model parameters, thereby expanding the set of effective in-context examples available at inference time. For the first time, we conduct a unified empirical comparison of context distillation, in-context learning (ICL), and few-shot fine-tuning (FT) under identical experimental conditions, using OPT-family models and cross-domain evaluation on the MOBAH matching dataset. Results show that context distillation matches ICL in in-domain and out-of-domain accuracy while significantly outperforming ICL in generalization. Moreover, it achieves this with only 10–20% of the training data and computational cost required by FT, offering a compelling trade-off among high performance, strong cross-domain generalization, and computational efficiency—establishing its unique advantage for few-shot adaptation.
📝 Abstract
Large Language Models (LLMs) demonstrate proficiency across diverse tasks but often require targeted adaptations for specific applications. Various methods have been proposed to facilitate this adaptation, including fewshot fine-tuning, in-context learning, and context distillation. This paper specifically investigates context distillation a method that extends the utility of task-specific examples by internalizing them, thus augmenting the example set accessible for model inference. We conduct a comparative analysis of context distillation with in-context learning (ICL) and few-shot fine-tuning (FT), aiming to ascertain the efficacy of context distillation in adapting models using minimal in-context examples. Employing matched datasets from Mobach, our experiments leverage OPT models of various sizes. The results indicate that context distillation effectively adapts models, with student models attaining comparable in-domain and out-of-domain accuracies to in-context learning. Although context distillation surpasses ICL in out-of-domain generalization, it does not achieve the performance levels of FT. However, the reduced dataset size and computational demands position context distillation as a viable alternative, especially for smaller datasets. Overall, this study presents context distillation as an efficient and potent method for customizing LLMs to specific tasks.