Hierarchical Concept Geometry in Language Models Emerges from Word Co-occurrence

πŸ“… 2026-05-22
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

206K/year
πŸ€– AI Summary
This study investigates the geometric encoding of hyponymy (β€œis-a”) hierarchies in language models. Building on the distributional hypothesis, it provides the first theoretical proof that semantic hierarchical structures naturally emerge from the spectral properties of word co-occurrence statistics, without explicit modeling. By analyzing the spectrum of the Gram matrix of word2vec embeddings, the word co-occurrence kernel, and leveraging the WordNet hypernym graph alongside the unembedding layer of Gemma 2B, the work reveals that conceptual hierarchies manifest in vector space as a coarse-to-fine splitting geometry. Experiments across multiple WordNet subtrees and large language models consistently validate this geometric pattern, demonstrating that semantic hierarchies fundamentally arise from the spectral characteristics of pairwise word co-occurrences.
πŸ“ Abstract
We propose a distributional theory of how hypernymy -- the ``is-a'' relation between general and specific concepts -- is encoded geometrically in language representations. Starting from the empirically verified assumption that words closer on the WordNet hypernym graph co-occur more often, we characterize theoretically the spectrum of the resulting embedding Gram matrix of word2vec embeddings. Under mild positivity and decay conditions on the co-occurrence kernel, we prove that the leading eigenvectors first separate broad taxonomic branches and then progressively finer sub-branches, producing a \emph{hierarchical splitting geometry} with a coarse-to-fine spectral organization that mirrors the tree. We confirm these predictions in word2vec embeddings across many sampled WordNet subtrees, and show that the same signature extends strikingly well to Gemma 2B unembeddings. Our results indicate that hierarchical concept geometry in LLMs need not reflect a hierarchy-specific functional mechanism, but emerges from the spectral structure of pairwise word statistics.
Problem

Research questions and friction points this paper is trying to address.

hypernymy
hierarchical concept geometry
word co-occurrence
language models
spectral structure
Innovation

Methods, ideas, or system contributions that make the work stand out.

hierarchical concept geometry
hypernymy
word co-occurrence
spectral structure
distributional semantics