Interpreting Transformer Architectures as Implicit Multinomial Regression

📅 2025-09-04

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work investigates the theoretical relationship between Transformer self-attention and fundamental mathematical properties—namely, feature polysemy, superposition, and classification performance. We formally establish that multi-head self-attention is dynamically equivalent to performing polynomial regression in an implicit high-dimensional feature space, and that its representation evolution follows a trajectory converging asymptotically toward optimal class-discriminative features. Methodologically, we construct a fixed-degree polynomial regression framework, rigorously characterizing how each attention block progressively refines latent representations along an optimization path; convergence is empirically validated via representation dynamics analysis. Our key contribution is the first rigorous theoretical linkage between self-attention and implicit polynomial regression, revealing how attention-driven feature reweighting within overcomplete representations enhances model performance. This provides a novel analytical paradigm for understanding Transformers’ inductive bias and generalization behavior. (149 words)

Technology Category

Application Category

📝 Abstract

Mechanistic interpretability aims to understand how internal components of modern machine learning models, such as weights, activations, and layers, give rise to the model's overall behavior. One particularly opaque mechanism is attention: despite its central role in transformer models, its mathematical underpinnings and relationship to concepts like feature polysemanticity, superposition, and model performance remain poorly understood. This paper establishes a novel connection between attention mechanisms and multinomial regression. Specifically, we show that in a fixed multinomial regression setting, optimizing over latent features yields optimal solutions that align with the dynamics induced by attention blocks. In other words, the evolution of representations through a transformer can be interpreted as a trajectory that recovers the optimal features for classification.

Problem

Research questions and friction points this paper is trying to address.

Understanding attention mechanisms in transformers mathematically

Linking attention dynamics to multinomial regression optimization

Interpreting representation evolution as optimal feature recovery

Innovation

Methods, ideas, or system contributions that make the work stand out.

Interpreting attention as multinomial regression optimization

Optimizing latent features aligns with attention dynamics

Transformer representations recover optimal classification features

🔎 Similar Papers

A mathematical perspective on Transformers