π€ AI Summary
This work investigates the intrinsic mechanisms of in-context learning (ICL) when tasks share structural commonalities, focusing on low-rank linear regression as a canonical setting. Methodologically, we construct a linear attention model and, in the high-dimensional limit, employ random matrix theory and statistical learning analysis to rigorously characterize the analytical relationship between the ICL predictive distribution and generalization error. Our key contributions are threefold: (i) we establish that limited pretraining data induces implicit regularization in ICL; (ii) we demonstrate that task structure governs generalization behavior and triggers sharp phase transitions in generalization error; and (iii) we provide the first verifiable theoretical framework supporting the hypothesis that Transformers βlearn to learn task structure.β This framework elucidates how structural priors and attention mechanisms jointly enable rapid generalization, significantly advancing the fundamental understanding of ICL.
π Abstract
In-context learning (ICL) is a key building block of modern large language models, yet its theoretical mechanisms remain poorly understood. It is particularly mysterious how ICL operates in real-world applications where tasks have a common structure. In this work, we address this problem by analyzing a linear attention model trained on low-rank regression tasks. Within this setting, we precisely characterize the distribution of predictions and the generalization error in the high-dimensional limit. Moreover, we find that statistical fluctuations in finite pre-training data induce an implicit regularization. Finally, we identify a sharp phase transition of the generalization error governed by task structure. These results provide a framework for understanding how transformers learn to learn the task structure.