π€ AI Summary
This work addresses the challenge of stably and efficiently approximating nonlinear operators between function spaces under varying discretizations and output domains using Transformer architectures. To this end, it proposes a graph-preserving Functional Graph Transformer that lifts input functions into graph-measure representations, formulating operator learning within a measure-theoretic framework. The graph-preserving structure guarantees that outputs remain single-valued functions while naturally supporting discretization refinement and cross-domain queries. This approach is the first to unify the treatment of negative-order Sobolev inputs, positional encoding effects, and discretization consistency. Theoretically, it establishes that a finite-depth self-attention mechanism combined with MLPs can universally approximate a broad class of nonlinear operators, achieving both expressive power and generalization consistency across discretizations.
π Abstract
We study the approximation of nonlinear operators between function spaces by transformers. Our approach is to lift functions to measures supported on their graphs and leverage a recently introduced measure-theoretic view of transformers. A function $h$ is represented by its graph measure $Ξ³_h$, with finite tokens $\{(x_j,h(x_j))\}_{j=1}^N$ being its empirical approximations. We show that this framework elegantly models discretization refinement via convergence of measures and provides a natural setting for operator learning. Within this framework, we introduce function graph transformers, a graph-preserving subclass of measure-theoretic transformers that maps graph measures to graph measures, which is to say that outputs remain single-valued functions. Crucially, this additional structure does not reduce generality: we prove that the resulting graph-preserving maps can be approximated by finite compositions of standard softmax self-attention layers and pointwise MLPs, yielding universal approximation results for broad classes of nonlinear operators. Unlike existing theoretical approaches to operator learning with transformers, the measure-theoretic framework also accommodates regularized negative-order Sobolev inputs for which discretization invariance is particularly challenging, as well as query points on different output domains. Overall, function graph transformers provide a continuum viewpoint and mathematical toolkit for transformer-based operator learning, clarifying the roles of positional encodings, graph structure, regularization, and ensuring consistency across discretizations.