Understanding In-Context Learning for Nonlinear Regression with Transformers: Attention as Featurizer

📅 2026-05-06
📈 Citations: 0
Influential: 0
📄 PDF

career value

195K/year
🤖 AI Summary
This work addresses the limited theoretical understanding of in-context learning (ICL) in Transformers for nonlinear regression, which has largely been confined to linear settings. The study proposes a novel perspective by interpreting the attention mechanism as an explicit feature constructor, enabling the design of Transformer architectures that generate nonlinear basis functions—such as polynomials or splines—and perform end-to-end in-context nonlinear regression. Building on this framework, the authors derive finite-sample generalization error bounds that explicitly depend on context length and training set size. Synthetic experiments validate the theoretical predictions, demonstrating that the proposed model not only effectively implements ICL for nonlinear tasks but also exhibits controllable generalization behavior.
📝 Abstract
Pre-trained transformers are able to learn from examples provided as part of the prompt without any weight updates, a remarkable ability known as in-context learning (ICL). Despite its demonstrated efficacy across various domains, the theoretical understanding of ICL is still developing. Whereas most existing theory has focused on linear models, we study ICL in the nonlinear regression setting. Through the interaction mechanism in attention, we explicitly construct transformer networks to realize nonlinear features, such as polynomial or spline bases, which span a wide class of functions. Based on this construction, we establish a framework to analyze end-to-end in-context nonlinear regression with the constructed features. Our theory provides finite-sample generalization error bounds in terms of context length and training set size. We numerically validate the theory on synthetic regression tasks.
Problem

Research questions and friction points this paper is trying to address.

in-context learning
nonlinear regression
transformers
attention mechanism
generalization error
Innovation

Methods, ideas, or system contributions that make the work stand out.

in-context learning
nonlinear regression
attention mechanism
transformer
generalization error