Learning Linear Regression with Low-Rank Tasks in-Context

📅 2025-10-06

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

This work investigates the intrinsic mechanisms of in-context learning (ICL) when tasks share structural commonalities, focusing on low-rank linear regression as a canonical setting. Methodologically, we construct a linear attention model and, in the high-dimensional limit, employ random matrix theory and statistical learning analysis to rigorously characterize the analytical relationship between the ICL predictive distribution and generalization error. Our key contributions are threefold: (i) we establish that limited pretraining data induces implicit regularization in ICL; (ii) we demonstrate that task structure governs generalization behavior and triggers sharp phase transitions in generalization error; and (iii) we provide the first verifiable theoretical framework supporting the hypothesis that Transformers “learn to learn task structure.” This framework elucidates how structural priors and attention mechanisms jointly enable rapid generalization, significantly advancing the fundamental understanding of ICL.

Technology Category

Application Category

📝 Abstract

In-context learning (ICL) is a key building block of modern large language models, yet its theoretical mechanisms remain poorly understood. It is particularly mysterious how ICL operates in real-world applications where tasks have a common structure. In this work, we address this problem by analyzing a linear attention model trained on low-rank regression tasks. Within this setting, we precisely characterize the distribution of predictions and the generalization error in the high-dimensional limit. Moreover, we find that statistical fluctuations in finite pre-training data induce an implicit regularization. Finally, we identify a sharp phase transition of the generalization error governed by task structure. These results provide a framework for understanding how transformers learn to learn the task structure.

Problem

Research questions and friction points this paper is trying to address.

Understanding in-context learning mechanisms in transformers with structured tasks

Analyzing generalization error and predictions in low-rank regression settings

Identifying phase transitions in generalization error governed by task structure

Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear attention model trained on low-rank regression tasks

Statistical fluctuations induce implicit data regularization

Phase transition of error governed by task structure

🔎 Similar Papers

Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits