Geometry of Lightning Self-Attention: Identifiability and Dimension

📅 2024-08-30

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 0

career value

258K/year

🤖 AI Summary

This work investigates the functional-space geometric structure of normalization-free self-attention networks, focusing on identifiability and dimensionality characterization. Methodologically, it introduces—within an algebraic geometry framework—the first polynomial mapping model for deep self-attention, rigorously deriving the universal fiber structure of parameter-to-function mappings for arbitrary depth, thereby fully resolving functional identifiability. It provides a closed-form formula for the dimension of the induced function space, completely characterizes the singularity set and boundary structure of single-layer models, theoretically proves the single-layer normalization conjecture, and numerically validates its plausibility in deeper architectures via computational algebraic techniques. The core contribution is the establishment of the first algebraic-geometric theoretical framework for self-attention, furnishing a rigorous geometric foundation for understanding the intrinsic representational capacity of deep attention mechanisms.

Technology Category

Application Category

📝 Abstract

We consider function spaces defined by self-attention networks without normalization, and theoretically analyze their geometry. Since these networks are polynomial, we rely on tools from algebraic geometry. In particular, we study the identifiability of deep attention by providing a description of the generic fibers of the parametrization for an arbitrary number of layers and, as a consequence, compute the dimension of the function space. Additionally, for a single-layer model, we characterize the singular and boundary points. Finally, we formulate a conjectural extension of our results to normalized self-attention networks, prove it for a single layer, and numerically verify it in the deep case.

Problem

Research questions and friction points this paper is trying to address.

analyzes geometry of self-attention networks

studies identifiability of deep attention

characterizes singular points in single-layer models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Algebraic geometry analyzes self-attention networks

Characterizes singular and boundary points

Extends results to normalized self-attention networks

🔎 Similar Papers

Emergence of a High-Dimensional Abstraction Phase in Language Transformers