🤖 AI Summary
This work addresses the challenge of embedding hierarchical data in Euclidean space, which suffers from volume collapse and consequently incurs exponentially growing sample complexity. By contrasting Euclidean and hyperbolic representations, we demonstrate that hyperbolic geometry effectively circumvents this bottleneck, enabling sample-efficient learning under low Lipschitz constants. We establish the first theoretical guarantee that for tree-structured data with depth \( R \) and branching factor \( m \), hyperbolic embeddings achieve information-theoretically optimal sample complexity with only \( O(mR \log m) \) samples. Furthermore, we show that any low-rank prediction space—regardless of its specific geometry—exhibits inherent representational limitations due to geometric incompatibility with hierarchical structures. Our analysis integrates hyperbolic embeddings, Lipschitz regularization, Fano’s inequality-based lower bounds, and capacity control theory to rigorously substantiate the theoretical advantages of hyperbolic representations in hierarchical learning.
📝 Abstract
We prove an exponential separation in sample complexity between Euclidean and hyperbolic representations for learning on hierarchical data under standard Lipschitz regularization. For depth-$R$ hierarchies with branching factor $m$, we first establish a geometric obstruction for Euclidean space: any bounded-radius embedding forces volumetric collapse, mapping exponentially many tree-distant points to nearby locations. This necessitates Lipschitz constants scaling as $\exp(\Omega(R))$ to realize even simple hierarchical targets, yielding exponential sample complexity under capacity control. We then show this obstruction vanishes in hyperbolic space: constant-distortion hyperbolic embeddings admit $O(1)$-Lipschitz realizability, enabling learning with $n = O(mR \log m)$ samples. A matching $\Omega(mR \log m)$ lower bound via Fano's inequality establishes that hyperbolic representations achieve the information-theoretic optimum. We also show a geometry-independent bottleneck: any rank-$k$ prediction space captures only $O(k)$ canonical hierarchical contrasts.