Transformer learns the cross-task prior and regularization for in-context learning

📅 2025-05-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the in-context learning (ICL) mechanism of Transformers in underdetermined inverse linear regression (ILR), aiming to elucidate how they implicitly infer high-dimensional unknown weight vectors from limited contextual examples. We introduce a linear Transformer model and conduct a rigorous theoretical analysis grounded in implicit regularization theory, complemented by comprehensive numerical experiments. Our key contributions are: (i) the first demonstration that Transformers can adaptively learn task-specific prior distributions across tasks and perform implicit regularization—departing fundamentally from explicit regularization paradigms such as ridge regression; (ii) a precise characterization of the necessary condition for successful learning: the task dimension must be less than the context length; and (iii) a tight error bound showing that estimation error scales linearly with noise level, the dimension-to-context ratio, and the condition number of the input matrix—consistently outperforming classical regularized estimators. This work establishes the first rigorous analytical framework for understanding ICL in inverse problems.

Technology Category

Application Category

📝 Abstract
Transformers have shown a remarkable ability for in-context learning (ICL), making predictions based on contextual examples. However, while theoretical analyses have explored this prediction capability, the nature of the inferred context and its utility for downstream predictions remain open questions. This paper aims to address these questions by examining ICL for inverse linear regression (ILR), where context inference can be characterized by unsupervised learning of underlying weight vectors. Focusing on the challenging scenario of rank-deficient inverse problems, where context length is smaller than the number of unknowns in the weight vectors and regularization is necessary, we introduce a linear transformer to learn the inverse mapping from contextual examples to the underlying weight vector. Our findings reveal that the transformer implicitly learns both a prior distribution and an effective regularization strategy, outperforming traditional ridge regression and regularization methods. A key insight is the necessity of low task dimensionality relative to the context length for successful learning. Furthermore, we numerically verify that the error of the transformer estimator scales linearly with the noise level, the ratio of task dimension to context length, and the condition number of the input data. These results not only demonstrate the potential of transformers for solving ill-posed inverse problems, but also provide a new perspective towards understanding the knowledge extraction mechanism within transformers.
Problem

Research questions and friction points this paper is trying to address.

Understanding how transformers infer context for in-context learning
Examining ICL for rank-deficient inverse linear regression problems
Learning implicit prior and regularization strategies in transformers
Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear transformer learns inverse mapping for ICL
Implicitly learns prior distribution and regularization
Outperforms ridge regression in rank-deficient scenarios
🔎 Similar Papers
No similar papers found.
Fei Lu
Fei Lu
Johns Hopkins University
applied probabilitystatistical learninginverse problemsdata assimilation
Y
Yue Yu
Department of Mathematics, Lehigh University, Bethlehem, PA 18015, USA