EmbedOR: Provable Cluster-Preserving Visualizations with Curvature-Based Stochastic Neighbor Embeddings

๐Ÿ“… 2025-09-03
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Stochastic Neighbor Embedding (SNE) methods such as t-SNE and UMAP often disrupt submanifold connectivity, distort geometric structure, and produce erroneous clustering when applied to high-dimensional noisy data. Method: We propose EmbedORโ€”the first SNE variant incorporating discrete Ollivier curvature into the embedding framework. EmbedOR constructs a curvature-augmented distance metric, theoretically extending t-SNEโ€™s consistency guarantees, and explicitly strengthens inter-cluster boundaries while preserving local structure via targeted objective optimization. The method further enables fragmentation detection and geometry-aware interpretability through visual annotation. Results: Experiments on synthetic and real-world datasets demonstrate that EmbedOR significantly mitigates fragmentation in high-density regions, recovers ground-truth cluster structures more accurately, and improves geometric fidelity and visualization reliability of dimensionality-reduced embeddings.

Technology Category

Application Category

๐Ÿ“ Abstract
Stochastic Neighbor Embedding (SNE) algorithms like UMAP and tSNE often produce visualizations that do not preserve the geometry of noisy and high dimensional data. In particular, they can spuriously separate connected components of the underlying data submanifold and can fail to find clusters in well-clusterable data. To address these limitations, we propose EmbedOR, a SNE algorithm that incorporates discrete graph curvature. Our algorithm stochastically embeds the data using a curvature-enhanced distance metric that emphasizes underlying cluster structure. Critically, we prove that the EmbedOR distance metric extends consistency results for tSNE to a much broader class of datasets. We also describe extensive experiments on synthetic and real data that demonstrate the visualization and geometry-preservation capabilities of EmbedOR. We find that, unlike other SNE algorithms and UMAP, EmbedOR is much less likely to fragment continuous, high-density regions of the data. Finally, we demonstrate that the EmbedOR distance metric can be used as a tool to annotate existing visualizations to identify fragmentation and provide deeper insight into the underlying geometry of the data.
Problem

Research questions and friction points this paper is trying to address.

Preserve geometry in noisy high-dimensional data visualizations
Prevent spurious separation of connected data components
Improve cluster detection in well-clusterable datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Curvature-enhanced distance metric for embedding
Extends consistency to broader dataset classes
Reduces fragmentation in high-density data regions
๐Ÿ”Ž Similar Papers