Uncovering Hierarchical Structure in LLM Embeddings with $δ$-Hyperbolicity, Ultrametricity, and Neighbor Joining

📅 2025-12-23

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This work investigates whether the embedding spaces of large language models (LLMs) intrinsically exhibit tree-like or hierarchical geometric structure, aiming to uncover their underlying organizational principles. Method: We propose the first systematic geometric assessment of hierarchy in LLM embeddings by jointly leveraging three complementary metrics: δ-hyperbolicity (quantifying deviation from tree-likeness), ultrametricity (a strict criterion for hierarchical structure), and adjacency-based connectivity (an algorithm-aware measure of tree similarity). We further conduct distance consistency analysis, tree reconstruction, and geometric statistical evaluation. Results: We find that mainstream LLM embeddings consistently exhibit significant—though varying—degrees of both hyperbolicity and ultrametricity; crucially, these geometric properties strongly correlate with downstream task performance. Our findings establish hierarchical geometry as an effective proxy for representation quality, offering a novel paradigm for embedding evaluation and model design.

Technology Category

Application Category

📝 Abstract

The rapid advancement of large language models (LLMs) has enabled significant strides in various fields. This paper introduces a novel approach to evaluate the effectiveness of LLM embeddings in the context of inherent geometric properties. We investigate the structural properties of these embeddings through three complementary metrics $δ$-hyperbolicity, Ultrametricity, and Neighbor Joining. $δ$-hyperbolicity, a measure derived from geometric group theory, quantifies how much a metric space deviates from being a tree-like structure. In contrast, ultrametricity characterizes strictly hierarchical structures where distances obey a strong triangle inequality. While Neighbor Joining quantifies how tree-like the distance relationships are, it does so specifically with respect to the tree reconstructed by the Neighbor Joining algorithm. By analyzing the embeddings generated by LLMs using these metrics, we uncover to what extent the embedding space reflects an underlying hierarchical or tree-like organization. Our findings reveal that LLM embeddings exhibit varying degrees of hyperbolicity and ultrametricity, which correlate with their performance in the underlying machine learning tasks.

Problem

Research questions and friction points this paper is trying to address.

Evaluates LLM embeddings using geometric metrics

Investigates hierarchical structure in embedding spaces

Correlates geometric properties with model performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using δ-hyperbolicity to measure tree-like deviation

Applying ultrametricity to assess hierarchical structure

Employing Neighbor Joining to quantify tree-like relationships

🔎 Similar Papers

No similar papers found.