Demonstrations, CoT, and Prompting: A Theoretical Analysis of ICL

📅 2026-03-19

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Existing theories struggle to explain, under weak assumptions, how factors such as demonstration selection, chain-of-thought (CoT) reasoning, the number of demonstrations, and prompt templates influence generalization in in-context learning (ICL). This work proposes a unified theoretical framework that, under mild assumptions, links demonstration quality, the model’s intrinsic ICL capability, and distribution shift to an upper bound on test loss, while modeling CoT as a task decomposition mechanism. By integrating Lipschitz-based generalization bounds, task decomposition analysis, and distribution shift metrics, the framework quantifies—for the first time—the impact of demonstration quality on generalization, identifies conditions under which CoT improves performance, and characterizes how prompt template sensitivity varies with the number of demonstrations. Both theoretical and empirical results elucidate how pretraining, CoT, and prompting jointly enable generalization to unseen tasks.

Technology Category

Application Category

📝 Abstract

In-Context Learning (ICL) enables pretrained LLMs to adapt to downstream tasks by conditioning on a small set of input-output demonstrations, without any parameter updates. Although there have been many theoretical efforts to explain how ICL works, most either rely on strong architectural or data assumptions, or fail to capture the impact of key practical factors such as demonstration selection, Chain-of-Thought (CoT) prompting, the number of demonstrations, and prompt templates. We address this gap by establishing a theoretical analysis of ICL under mild assumptions that links these design choices to generalization behavior. We derive an upper bound on the ICL test loss, showing that performance is governed by (i) the quality of selected demonstrations, quantified by Lipschitz constants of the ICL loss along paths connecting test prompts to pretraining samples, (ii) an intrinsic ICL capability of the pretrained model, and (iii) the degree of distribution shift. Within the same framework, we analyze CoT prompting as inducing a task decomposition and show that it is beneficial when demonstrations are well chosen at each substep and the resulting subtasks are easier to learn. Finally, we characterize how ICL performance sensitivity to prompt templates varies with the number of demonstrations. Together, our study shows that pretraining equips the model with the ability to generalize beyond observed tasks, while CoT enables the model to compose simpler subtasks into more complex ones, and demonstrations and instructions enable it to retrieve similar or complex tasks, including those that can be composed into more complex ones, jointly supporting generalization to unseen tasks. All theoretical insights are corroborated by experiments.

Problem

Research questions and friction points this paper is trying to address.

In-Context Learning

Chain-of-Thought prompting

demonstration selection

prompt templates

generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

In-Context Learning

Chain-of-Thought Prompting

Generalization Bound