Understanding In-Context Learning on Structured Manifolds: Bridging Attention to Kernel Methods

📅 2025-06-12

📈 Citations: 0

✨ Influential: 0

career value

237K/year

🤖 AI Summary

Prior work fails to characterize how geometric priors on structured manifolds affect in-context learning (ICL) generalization, particularly for Hölder function regression. Method: We establish a theoretical connection between Transformer self-attention and manifold kernel methods, integrating tools from differential geometry, kernel regression, and attention analysis. Contribution/Results: We derive a generalization error bound that depends explicitly on prompt length, number of tasks, and manifold curvature—dominated by the manifold’s intrinsic dimension rather than ambient dimension. The bound is exponentially optimal in the intrinsic dimension, providing the first quantitative characterization of the coupled effect of geometric complexity and task scale on ICL generalization. This yields a rigorous theoretical foundation for interpretable learning with large models on structured manifolds.

Technology Category

Application Category

📝 Abstract

While in-context learning (ICL) has achieved remarkable success in natural language and vision domains, its theoretical understanding--particularly in the context of structured geometric data--remains unexplored. In this work, we initiate a theoretical study of ICL for regression of H""older functions on manifolds. By establishing a novel connection between the attention mechanism and classical kernel methods, we derive generalization error bounds in terms of the prompt length and the number of training tasks. When a sufficient number of training tasks are observed, transformers give rise to the minimax regression rate of H""older functions on manifolds, which scales exponentially with the intrinsic dimension of the manifold, rather than the ambient space dimension. Our result also characterizes how the generalization error scales with the number of training tasks, shedding light on the complexity of transformers as in-context algorithm learners. Our findings provide foundational insights into the role of geometry in ICL and novels tools to study ICL of nonlinear models.

Problem

Research questions and friction points this paper is trying to address.

Theoretical understanding of in-context learning for structured geometric data

Connection between attention mechanism and classical kernel methods

Generalization error bounds for regression of Hölder functions on manifolds

Innovation

Methods, ideas, or system contributions that make the work stand out.

Connects attention mechanism to kernel methods

Studies ICL for Hölder functions on manifolds

Derives generalization bounds via prompt length

🔎 Similar Papers

Nonlinear classification of neural manifolds with contextual information