Why Can't I See My Clusters? A Precision-Recall Approach to Dimensionality Reduction Validation

📅 2025-09-04

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Existing dimensionality reduction (DR) quality metrics fail to explain why expected cluster structures vanish in low-dimensional projections. To address this, we propose the first supervised precision–recall evaluation framework targeting the relational modeling stage: it converts labeled cluster structures into pairwise similarity constraints and quantitatively assesses how well local and global neighborhood relationships are preserved during DR. Our method enables root-cause attribution of structural loss—e.g., neighborhood collapse or outlier-induced distortion—and supports hyperparameter tuning and anomaly diagnosis for algorithms such as t-SNE and UMAP. Experiments across multiple datasets and DR scenarios demonstrate that the metric rapidly detects projection artifacts, enhances visualization reliability, and exhibits robust effectiveness under diverse configurations.

Technology Category

Application Category

📝 Abstract

Dimensionality Reduction (DR) is widely used for visualizing high-dimensional data, often with the goal of revealing expected cluster structure. However, such a structure may not always appear in the projections. Existing DR quality metrics assess projection reliability (to some extent) or cluster structure quality, but do not explain why expected structures are missing. Visual Analytics solutions can help, but are often time-consuming due to the large hyperparameter space. This paper addresses this problem by leveraging a recent framework that divides the DR process into two phases: a relationship phase, where similarity relationships are modeled, and a mapping phase, where the data is projected accordingly. We introduce two supervised metrics, precision and recall, to evaluate the relationship phase. These metrics quantify how well the modeled relationships align with an expected cluster structure based on some set of labels representing this structure. We illustrate their application using t-SNE and UMAP, and validate the approach through various usage scenarios. Our approach can guide hyperparameter tuning, uncover projection artifacts, and determine if the expected structure is captured in the relationships, making the DR process faster and more reliable.

Problem

Research questions and friction points this paper is trying to address.

Evaluating why expected cluster structures are missing in dimensionality reduction visualizations

Quantifying alignment between modeled relationships and expected cluster structures

Guiding hyperparameter tuning and uncovering projection artifacts in DR methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Precision-recall metrics for relationship phase

Supervised evaluation using expected cluster labels

Guiding hyperparameter tuning and artifact detection

🔎 Similar Papers

HUMAP: Hierarchical Uniform Manifold Approximation and Projection