The Curved Spacetime of Transformer Architectures

📅 2025-11-04

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work investigates the geometric structure of Transformer representation spaces, addressing how attention mechanisms shape the underlying geometry of learned embeddings. Method: We introduce “attention-induced curvature,” modeling self-attention as discrete parallel transport on a curved manifold—inspired by general relativity—and construct an effective metric from query-key similarities to quantify the bending of embedding trajectories across layers. Contextual editing experiments are designed to detect semantically coherent trajectory deflections. Contribution/Results: We provide the first formal definition of curvature in Transformer latent spaces; empirically demonstrate that embedding paths significantly deviate from geodesics (evidenced by increasing length-to-chord ratios and non-random angular distributions); visualize paragraph-level curvature landscapes; and establish measurable functional impacts of curvature on representation evolution and semantic stability. This work establishes a novel geometric paradigm for interpreting large language models.

Technology Category

Application Category

📝 Abstract

We present a geometric framework for understanding Transformer-based language models, drawing an explicit analogy to General Relativity. Queries and keys induce an effective metric on representation space, and attention acts as a discrete connection that implements parallel transport of value vectors across tokens. Stacked layers provide discrete time-slices through which token representations evolve on this curved manifold, while backpropagation plays the role of a least-action principle that shapes loss-minimizing trajectories in parameter space. If this analogy is correct, token embeddings should not traverse straight paths in feature space; instead, their layer-wise steps should bend and reorient as interactions mediated by embedding space curvature. To test this prediction, we design experiments that expose both the presence and the consequences of curvature: (i) we visualize a curvature landscape for a full paragraph, revealing how local turning angles vary across tokens and layers; (ii) we show through simulations that excess counts of sharp/flat angles and longer length-to-chord ratios are not explainable by dimensionality or chance; and (iii) inspired by Einstein's eclipse experiment, we probe deflection under controlled context edits, demonstrating measurable, meaning-consistent bends in embedding trajectories that confirm attention-induced curvature.

Problem

Research questions and friction points this paper is trying to address.

Modeling Transformer attention as curved spacetime geometry in neural networks

Analyzing token trajectory curvature through attention-induced metric interactions

Validating geometric predictions through controlled context perturbation experiments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Attention mechanism creates curved representation space metric

Layer stacking enables token evolution on curved manifold

Backpropagation shapes loss-minimizing trajectories as least-action principle

🔎 Similar Papers

No similar papers found.