Cluster and then Embed: A Modular Approach for Visualization

📅 2025-08-26

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Existing dimensionality reduction methods—such as t-SNE and UMAP—preserve local structure and yield well-separated clusters but often severely distort global geometry. To address this, we propose a modular visualization framework based on a “cluster-then-embed” paradigm: clustering (e.g., K-means, DBSCAN) and embedding (e.g., t-SNE, UMAP) are decoupled, and inter-cluster alignment constraints are explicitly imposed to govern the relative positions of local embeddings. This preserves local fidelity while restoring global structural consistency. The modular design enhances procedural transparency and result interpretability. Experiments on multiple synthetic and real-world datasets demonstrate that our method achieves visualization quality comparable to state-of-the-art approaches, while significantly improving global geometric fidelity and user interpretability.

Technology Category

Application Category

📝 Abstract

Dimensionality reduction methods such as t-SNE and UMAP are popular methods for visualizing data with a potential (latent) clustered structure. They are known to group data points at the same time as they embed them, resulting in visualizations with well-separated clusters that preserve local information well. However, t-SNE and UMAP also tend to distort the global geometry of the underlying data. We propose a more transparent, modular approach consisting of first clustering the data, then embedding each cluster, and finally aligning the clusters to obtain a global embedding. We demonstrate this approach on several synthetic and real-world datasets and show that it is competitive with existing methods, while being much more transparent.

Problem

Research questions and friction points this paper is trying to address.

Separates clustering and embedding for transparent visualization

Preserves global geometry while maintaining local cluster structure

Addresses distortion issues in t-SNE and UMAP methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Clustering data before embedding

Embedding each cluster separately

Aligning clusters for global embedding

🔎 Similar Papers

No similar papers found.