t-SNE Exaggerates Clusters, Provably

📅 2025-10-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
t-SNE visualizations systematically exaggerate clustering structure in input data, leading to misjudgments of true cluster cohesion and outlier extremity. Method: We provide the first rigorous theoretical proof that t-SNE’s objective function inherently induces inter-cluster separation amplification and outlier over-extrusion—structural distortions arising from its gradient dynamics and attraction-repulsion mechanism. We validate this bias empirically across synthetic and real-world datasets, demonstrating its universality and parameter independence. Contribution/Results: We establish that t-SNE-generated inter-cluster distances and outlier positions are unreliable proxies for underlying data structure, directly challenging the common assumption that t-SNE faithfully preserves global geometry in exploratory data analysis. Our analysis delineates fundamental interpretability limits of t-SNE and other gradient-based dimensionality reduction methods, providing a theoretical foundation for their epistemic boundaries. Crucially, we caution against quantitative structural inference—such as estimating cluster separation strength or outlier significance—directly from t-SNE embeddings.

Technology Category

Application Category

📝 Abstract
Central to the widespread use of t-distributed stochastic neighbor embedding (t-SNE) is the conviction that it produces visualizations whose structure roughly matches that of the input. To the contrary, we prove that (1) the strength of the input clustering, and (2) the extremity of outlier points, cannot be reliably inferred from the t-SNE output. We demonstrate the prevalence of these failure modes in practice as well.
Problem

Research questions and friction points this paper is trying to address.

t-SNE exaggerates cluster structures in visualizations
Input clustering strength cannot be reliably inferred
Outlier extremity cannot be accurately determined from output
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proves t-SNE exaggerates cluster structure
Shows input clustering strength unreliable
Demonstrates outlier extremity misrepresented
🔎 Similar Papers
No similar papers found.