Learning Linear Regression with Low-Rank Tasks in-Context

πŸ“… 2025-10-06
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

180K/year
πŸ€– AI Summary
This work investigates the intrinsic mechanisms of in-context learning (ICL) when tasks share structural commonalities, focusing on low-rank linear regression as a canonical setting. Methodologically, we construct a linear attention model and, in the high-dimensional limit, employ random matrix theory and statistical learning analysis to rigorously characterize the analytical relationship between the ICL predictive distribution and generalization error. Our key contributions are threefold: (i) we establish that limited pretraining data induces implicit regularization in ICL; (ii) we demonstrate that task structure governs generalization behavior and triggers sharp phase transitions in generalization error; and (iii) we provide the first verifiable theoretical framework supporting the hypothesis that Transformers β€œlearn to learn task structure.” This framework elucidates how structural priors and attention mechanisms jointly enable rapid generalization, significantly advancing the fundamental understanding of ICL.

Technology Category

Application Category

πŸ“ Abstract
In-context learning (ICL) is a key building block of modern large language models, yet its theoretical mechanisms remain poorly understood. It is particularly mysterious how ICL operates in real-world applications where tasks have a common structure. In this work, we address this problem by analyzing a linear attention model trained on low-rank regression tasks. Within this setting, we precisely characterize the distribution of predictions and the generalization error in the high-dimensional limit. Moreover, we find that statistical fluctuations in finite pre-training data induce an implicit regularization. Finally, we identify a sharp phase transition of the generalization error governed by task structure. These results provide a framework for understanding how transformers learn to learn the task structure.
Problem

Research questions and friction points this paper is trying to address.

Understanding in-context learning mechanisms in transformers with structured tasks
Analyzing generalization error and predictions in low-rank regression settings
Identifying phase transitions in generalization error governed by task structure
Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear attention model trained on low-rank regression tasks
Statistical fluctuations induce implicit data regularization
Phase transition of error governed by task structure
πŸ”Ž Similar Papers