🤖 AI Summary
This work investigates the functional-space geometric structure of normalization-free self-attention networks, focusing on identifiability and dimensionality characterization. Methodologically, it introduces—within an algebraic geometry framework—the first polynomial mapping model for deep self-attention, rigorously deriving the universal fiber structure of parameter-to-function mappings for arbitrary depth, thereby fully resolving functional identifiability. It provides a closed-form formula for the dimension of the induced function space, completely characterizes the singularity set and boundary structure of single-layer models, theoretically proves the single-layer normalization conjecture, and numerically validates its plausibility in deeper architectures via computational algebraic techniques. The core contribution is the establishment of the first algebraic-geometric theoretical framework for self-attention, furnishing a rigorous geometric foundation for understanding the intrinsic representational capacity of deep attention mechanisms.
📝 Abstract
We consider function spaces defined by self-attention networks without normalization, and theoretically analyze their geometry. Since these networks are polynomial, we rely on tools from algebraic geometry. In particular, we study the identifiability of deep attention by providing a description of the generic fibers of the parametrization for an arbitrary number of layers and, as a consequence, compute the dimension of the function space. Additionally, for a single-layer model, we characterize the singular and boundary points. Finally, we formulate a conjectural extension of our results to normalized self-attention networks, prove it for a single layer, and numerically verify it in the deep case.