When One Point Is Not Enough: Addressing Ambiguous Instances in Dimensionality Reduction by Splitting

📅 2026-05-22
📈 Citations: 0
Influential: 0
📄 PDF

career value

213K/year
🤖 AI Summary
This work addresses a fundamental limitation in dimensionality reduction visualization: “ambiguous instances” in high-dimensional space—those exhibiting strong similarity to multiple distinct neighborhoods—cannot be faithfully represented by a single point in the embedding, leading to partial loss of neighborhood structure. To resolve this, the authors propose a graph-based instance splitting approach that first identifies such ambiguous instances and then duplicates them into multiple copies in the projection space, with each copy embedded near its corresponding high-dimensional neighborhood. By moving beyond the conventional one-to-one mapping constraint of traditional dimensionality reduction methods, this technique systematically mitigates embedding distortions caused by ambiguous instances. Integrated with local-graph-driven algorithms like UMAP, it effectively uncovers previously obscured neighborhood relationships across multiple datasets, with quantitative experiments demonstrating its superior capability in preserving local structure.
📝 Abstract
Dimensionality Reduction (DR) methods are widely used to visualize high-dimensional data. One key task in DR-based analysis is discovering neighborhoods, which relies on analyzing the fine-grained local structure of a projection. However, DR is an inherently lossy process; no technique can perfectly preserve the high-dimensional relationships, and projections therefore contain visual artifacts. In this paper, we highlight a typically overlooked source of visual artifacts: ambiguous instances. These are instances that are highly similar to multiple mutually dissimilar neighborhoods in the high-dimensional space. Standard DR methods cannot faithfully project such instances, since each data instance is mapped to a single point in the visual space. As a result, such an instance is placed in only one of its neighborhoods (or in none at all), so only part of its neighborhood structure is represented. We call this distortion partial neighborhood embedding. In this paper, we introduce a graph-based approach that identifies ambiguous instances and replicates them as multiple points in the projection, placing each copy within its respective neighborhood. We use UMAP for our results, but our approach also generalizes to other local graph-based DR techniques, and we show that our approach reveals previously hidden neighborhood memberships in projections and reduces partial neighborhood embedding across multiple examples, and is further supported by quantitative analyses.
Problem

Research questions and friction points this paper is trying to address.

dimensionality reduction
ambiguous instances
neighborhood structure
visual artifacts
partial neighborhood embedding
Innovation

Methods, ideas, or system contributions that make the work stand out.

ambiguous instances
partial neighborhood embedding
dimensionality reduction
graph-based DR
neighborhood replication