Tree-Wasserstein Distance for High Dimensional Data with a Latent Feature Hierarchy

📅 2024-10-28
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of modeling implicit hierarchical structures among high-dimensional features and the lack of semantic interpretability in conventional distance metrics, this paper introduces the Tree-Wasserstein Distance (TWD). TWD embeds features into a multi-scale hyperbolic space and leverages diffusion geometry to achieve low-distortion hierarchical embedding; it further employs a tree-decoding algorithm to explicitly reconstruct feature hierarchies. Unlike prevailing approaches that focus on sample-level embeddings, TWD is the first method to learn feature hierarchies directly as tree-structured representations. We provide theoretical guarantees showing that TWD consistently recovers the true underlying hierarchy from observed data. Empirically, on word–document and single-cell RNA-seq datasets, TWD significantly outperforms existing Wasserstein-based methods and pretrained models—while maintaining computational efficiency, scalability, and strong theoretical interpretability.

Technology Category

Application Category

📝 Abstract
Finding meaningful distances between high-dimensional data samples is an important scientific task. To this end, we propose a new tree-Wasserstein distance (TWD) for high-dimensional data with two key aspects. First, our TWD is specifically designed for data with a latent feature hierarchy, i.e., the features lie in a hierarchical space, in contrast to the usual focus on embedding samples in hyperbolic space. Second, while the conventional use of TWD is to speed up the computation of the Wasserstein distance, we use its inherent tree as a means to learn the latent feature hierarchy. The key idea of our method is to embed the features into a multi-scale hyperbolic space using diffusion geometry and then present a new tree decoding method by establishing analogies between the hyperbolic embedding and trees. We show that our TWD computed based on data observations provably recovers the TWD defined with the latent feature hierarchy and that its computation is efficient and scalable. We showcase the usefulness of the proposed TWD in applications to word-document and single-cell RNA-sequencing datasets, demonstrating its advantages over existing TWDs and methods based on pre-trained models.
Problem

Research questions and friction points this paper is trying to address.

High-dimensional data distance
Latent feature hierarchy
Efficient tree-Wasserstein computation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tree-Wasserstein Distance for hierarchy
Embed features in hyperbolic space
Efficient, scalable tree decoding method
🔎 Similar Papers
No similar papers found.