Solvable Dynamics of Self-Supervised Word Embeddings and the Emergence of Analogical Reasoning

📅 2025-02-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates how self-supervised word embeddings—particularly quadratic models—spontaneously develop structured latent representations and emergent analogical reasoning capabilities through training dynamics. We propose the first analytically tractable quadratic self-supervised word embedding model, integrating contrastive learning with matrix differential equation analysis to derive closed-form solutions for both training dynamics and final embeddings. Theoretically, we show that the embedding space activates orthogonal linear subspaces hierarchically, ordered by co-occurrence statistics in the corpus; analogical reasoning emerges incrementally as these subspaces activate, and its onset and evolution are precisely predictable. Empirical validation on WikiText confirms that each activated subspace corresponds to interpretable semantic concepts and accurately predicts both the emergence timing and developmental trajectory of analogical completion performance. Our core contribution is establishing an analytically grounded causal chain linking training dynamics, representational structure, and downstream reasoning capability.

Technology Category

Application Category

📝 Abstract
The remarkable success of large language models relies on their ability to implicitly learn structured latent representations from the pretraining corpus. As a simpler surrogate for representation learning in language modeling, we study a class of solvable contrastive self-supervised algorithms which we term quadratic word embedding models. These models resemble the word2vec algorithm and perform similarly on downstream tasks. Our main contributions are analytical solutions for both the training dynamics (under certain hyperparameter choices) and the final word embeddings, given in terms of only the corpus statistics. Our solutions reveal that these models learn orthogonal linear subspaces one at a time, each one incrementing the effective rank of the embeddings until model capacity is saturated. Training on WikiText, we find that the top subspaces represent interpretable concepts. Finally, we use our dynamical theory to predict how and when models acquire the ability to complete analogies.
Problem

Research questions and friction points this paper is trying to address.

Analytical solutions for training dynamics
Emergence of interpretable concept subspaces
Prediction of analogy acquisition timing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Solvable quadratic word embedding models
Analytical solutions for training dynamics
Predict analogy acquisition dynamics
🔎 Similar Papers
No similar papers found.