Transformers Are Born Biased: Structural Inductive Biases at Random Initialization and Their Practical Consequences

📅 2026-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work challenges the prevailing assumption that randomly initialized Transformers lack structural biases, revealing instead that they exhibit systematic architectural preferences even before training. Through mechanistic interpretability and representational geometry analyses, the authors demonstrate that the interplay between MLP nonlinearities and self-attention induces hidden representations to contract along directions determined by the random initialization seed. Building on this insight, they propose SeedPrint—a model fingerprinting technique based on initialization structure—that reliably distinguishes models differing only in their random seeds. Furthermore, the study establishes a causal link between this attention-induced contraction and the attention sink phenomenon, offering a theoretical foundation for understanding and controlling it. The findings indicate that such initialization biases persist throughout training, underscoring the profound influence of the random seed on model behavior.

Technology Category

Application Category

📝 Abstract
Transformers underpin modern large language models (LLMs) and are commonly assumed to be behaviorally unstructured at random initialization, with all meaningful preferences emerging only through large-scale training. We challenge this assumption by showing that randomly initialized transformers already exhibit strong and systematic structural biases. In particular, untrained models display extreme token preferences: across random input sequences, certain tokens are predicted with probabilities orders of magnitude larger. We provide a mechanistic explanation for this phenomenon by dissecting the transformer architecture at initialization. We show that extreme token preference arises from a contraction of token representations along a random seed-dependent direction. This contraction is driven by two interacting forces: (i) asymmetric nonlinear activations in MLP sublayers induce global (inter-sequence) representation concentration, and (ii) self-attention further amplifies this effect through local (intra-sequence) aggregation. Together, these mechanisms align hidden representations along a direction determined solely by the random initialization, producing highly non-uniform next-token predictions. Beyond mechanistic insight, we demonstrate that these initialization-induced biases persist throughout training, forming a stable and intrinsic model identity. Leveraging this property, we introduce SeedPrint, a fingerprinting method that can reliably distinguish models that differ only in their random initialization, even after extensive training and under substantial distribution shift. Finally, we identify a fundamental positional discrepancy inherent to the attention mechanism's intra-sequence contraction that is causally linked to the attention-sink phenomenon. This discovery provides a principled explanation for the emergence of sinks and offers a pathway for their control.
Problem

Research questions and friction points this paper is trying to address.

structural bias
random initialization
token preference
transformer architecture
inductive bias
Innovation

Methods, ideas, or system contributions that make the work stand out.

structural bias
random initialization
representation contraction
SeedPrint
attention sink