🤖 AI Summary
Transformer positional encodings and attention mechanisms have long lacked a unified geometric and physical interpretation.
Method: This paper introduces the first framework embedding Transformers within geometric field theory: discrete token positions are mapped to a continuous embedding manifold, and self-attention is formalized as a kernel-modulated integral operator defined on this manifold. By integrating manifold embedding, differential geometry, and field-theoretic principles, attention is recast as function modulation and transformation in continuous space.
Contribution/Results: The framework provides an interpretable geometric semantics for core Transformer components—unifying the mathematical foundations of positional encoding and attention—and establishes a theoretical bridge between discrete neural architectures and continuous field theory. It enables principled design of next-generation attention mechanisms endowed with explicit geometric priors, advancing both interpretability and inductive bias engineering in deep learning.
📝 Abstract
The Transformer architecture has achieved tremendous success in natural language processing, computer vision, and scientific computing through its self-attention mechanism. However, its core components-positional encoding and attention mechanisms-have lacked a unified physical or mathematical interpretation. This paper proposes a structural theoretical framework that integrates positional encoding, kernel integral operators, and attention mechanisms for in-depth theoretical investigation. We map discrete positions (such as text token indices and image pixel coordinates) to spatial functions on continuous manifolds, enabling a field-theoretic interpretation of Transformer layers as kernel-modulated operators acting over embedded manifolds.