The Alchemy of Thought: Understanding In-Context Learning Through Supervised Classification

📅 2026-01-03

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This study investigates the working mechanisms of in-context learning (ICL) in large language models (LLMs) and systematically compares their behavior with that of traditional supervised classifiers—such as logistic regression and k-nearest neighbors (kNN)—on text classification tasks. Through empirical analysis across six datasets involving three LLMs and two types of classifiers, the work reveals for the first time that ICL behaves similarly to kNN when provided with highly relevant demonstrations, yet significantly outperforms conventional methods when demonstrations are of low relevance. This indicates that LLMs effectively leverage their parametric prior knowledge to compensate for poor demonstration quality. The findings highlight the unique generalization capability of LLMs, which transcends the limitations of non-parametric matching mechanisms inherent in traditional classifiers.

Technology Category

Application Category

📝 Abstract

In-context learning (ICL) has become a prominent paradigm to rapidly customize LLMs to new tasks without fine-tuning. However, despite the empirical evidence of its usefulness, we still do not truly understand how ICL works. In this paper, we compare the behavior of in-context learning with supervised classifiers trained on ICL demonstrations to investigate three research questions: (1) Do LLMs with ICL behave similarly to classifiers trained on the same examples? (2) If so, which classifiers are closer, those based on gradient descent (GD) or those based on k-nearest neighbors (kNN)? (3) When they do not behave similarly, what conditions are associated with differences in behavior? Using text classification as a use case, with six datasets and three LLMs, we observe that LLMs behave similarly to these classifiers when the relevance of demonstrations is high. On average, ICL is closer to kNN than logistic regression, giving empirical evidence that the attention mechanism behaves more similarly to kNN than GD. However, when demonstration relevance is low, LLMs perform better than these classifiers, likely because LLMs can back off to their parametric memory, a luxury these classifiers do not have.

Problem

Research questions and friction points this paper is trying to address.

in-context learning

supervised classification

large language models

k-nearest neighbors

gradient descent

Innovation

Methods, ideas, or system contributions that make the work stand out.

in-context learning

k-nearest neighbors

gradient descent