All or None: Identifiable Linear Properties of Next-token Predictors in Language Modeling

📅 2024-10-30

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 1

career value

189K/year

🤖 AI Summary

This paper investigates the identifiability of linear structures—such as morphological vector parallelism—in language models: if a model exhibits a specific linear property, must all models with identical next-token prediction distributions necessarily share that property? Method: Under a weakened diversity assumption, the authors provide the first complete characterization of distributionally equivalent predictors. Contribution/Results: They prove that linear properties exhibit “all-or-nothing” identifiability: under reasonable conditions, if one model in a distributional equivalence class satisfies a given linear relation, then *all* (or *none*) of its distributionally equivalent models do so. This result unifies diverse geometric linear phenomena observed in neural language models and establishes the first rigorous, distribution-equivalence-based theoretical foundation for neural-symbolic alignment and model interpretability.

Technology Category

Application Category

📝 Abstract

We analyze identifiability as a possible explanation for the ubiquity of linear properties across language models, such as the vector difference between the representations of"easy"and"easiest"being parallel to that between"lucky"and"luckiest". For this, we ask whether finding a linear property in one model implies that any model that induces the same distribution has that property, too. To answer that, we first prove an identifiability result to characterize distribution-equivalent next-token predictors, lifting a diversity requirement of previous results. Second, based on a refinement of relational linearity [Paccanaro and Hinton, 2001; Hernandez et al., 2024], we show how many notions of linearity are amenable to our analysis. Finally, we show that under suitable conditions, these linear properties either hold in all or none distribution-equivalent next-token predictors.

Problem

Research questions and friction points this paper is trying to address.

Analyzing identifiability of linear properties in language models

Characterizing distribution-equivalent next-token predictors

Determining universal linear properties across equivalent models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Identifiability characterizes distribution-equivalent next-token predictors

Refined relational linearity enables broader linearity analysis

Linear properties hold universally across equivalent predictors

🔎 Similar Papers

The Remarkable Robustness of LLMs: Stages of Inference?