🤖 AI Summary
Existing methods fail to jointly model hierarchical structures among both samples and features in high-dimensional data (e.g., word–document matrices, single-cell RNA-seq), where both dimensions exhibit tree-like hierarchies.
Method: We propose the first coupled hierarchical modeling framework that jointly learns sample and feature trees. Our approach introduces the Tree-Wasserstein Distance (TWD) as a unified optimization objective, integrating diffusion geometry for local similarity, hyperbolic geometry for hierarchical representation, and wavelet filtering for multi-scale feature extraction; we further design an unsupervised iterative algorithm with theoretical convergence guarantees. The framework seamlessly embeds into Hyperbolic Graph Convolutional Networks (HGCN).
Results: Experiments demonstrate significant improvements over baselines in sparse approximation and unsupervised Wasserstein learning. Moreover, our method enhances HGCN’s performance on link prediction and node classification tasks.
📝 Abstract
In many applications, both data samples and features have underlying hierarchical structures. However, existing methods for learning these latent structures typically focus on either samples or features, ignoring possible coupling between them. In this paper, we introduce a coupled hierarchical structure learning method using tree-Wasserstein distance (TWD). Our method jointly computes TWDs for samples and features, representing their latent hierarchies as trees. We propose an iterative, unsupervised procedure to build these sample and feature trees based on diffusion geometry, hyperbolic geometry, and wavelet filters. We show that this iterative procedure converges and empirically improves the quality of the constructed trees. The method is also computationally efficient and scales well in high-dimensional settings. Our method can be seamlessly integrated with hyperbolic graph convolutional networks (HGCN). We demonstrate that our method outperforms competing approaches in sparse approximation and unsupervised Wasserstein distance learning on several word-document and single-cell RNA-sequencing datasets. In addition, integrating our method into HGCN enhances performance in link prediction and node classification tasks.