Compressibility Barriers to Neighborhood-Preserving Data Visualizations

📅 2025-08-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates the theoretical limits of preserving neighborhood structure in low-dimensional visualizations of high-dimensional data. Addressing the fundamental question—“Can neighborhood relations of high-dimensional data be reliably preserved in constant-dimensional spaces (e.g., 2D/3D)?”—we introduce the *doubling dimension* as a geometric complexity measure for embedding difficulty. Leveraging graph embedding theory, metric space analysis, and planted cluster models, we systematically characterize visualization compressibility across graph classes. We prove: (i) almost all $n$-vertex graphs require $Omega(log n)$ dimensions to maintain neighborhood separability; (ii) sparse regular graphs still necessitate $Omega(log n / log log n)$ dimensions; and (iii) in normed spaces, nearly all graphs require $Theta(n)$ dimensions. This work provides the first information-theoretic and geometric characterization of intrinsic dimensional bottlenecks in common dimensionality reduction techniques (e.g., t-SNE, UMAP), establishing rigorous theoretical foundations for visualization design and interpretation.

Technology Category

Application Category

📝 Abstract
To what extent is it possible to visualize high-dimensional datasets in a two- or three-dimensional space? We reframe this question in terms of embedding $n$-vertex graphs (representing the neighborhood structure of the input points) into metric spaces of low doubling dimension $d$, in such a way that maintains the separation between neighbors and non-neighbors. This seemingly lax embedding requirement is surprisingly difficult to satisfy. Our investigation shows that an overwhelming fraction of graphs require $d = Ω(log n)$. Even when considering sparse regular graphs, the situation does not improve, as an overwhelming fraction of such graphs requires $d= Ω(log n / loglog n)$. The landscape changes dramatically when embedding into normed spaces. In particular, all but a vanishing fraction of graphs demand $d=Θ(n)$. Finally, we study the implications of these results for visualizing data with intrinsic cluster structure. We find that graphs produced from a planted partition model with $k$ clusters on $n$ points typically require $d=Ω(log n)$, even when the cluster structure is salient. These results challenge the aspiration that constant-dimensional visualizations can faithfully preserve neighborhood structure.
Problem

Research questions and friction points this paper is trying to address.

Visualizing high-dimensional data in low-dimensional spaces
Embedding graphs into low doubling dimension metric spaces
Preserving neighborhood structure in data visualizations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Embed graphs into low doubling dimension spaces
Analyze sparse regular graphs for embedding requirements
Study implications for visualizing clustered data
🔎 Similar Papers
No similar papers found.