Surpassing Cosine Similarity for Multidimensional Comparisons: Dimension Insensitive Euclidean Metric (DIEM)

📅 2024-07-11

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

215K/year

🤖 AI Summary

In high-dimensional data, conventional similarity measures such as cosine similarity suffer from poor interpretability and degraded discriminability due to inherent dimensionality dependence. To address this, we propose Dimension-Independent Euclidean Measure (DIEM)—the first Euclidean-style metric with theoretically guaranteed dimension independence. DIEM rigorously characterizes the dimensional bias mechanism of cosine similarity and eliminates variance drift across dimensions via geometric normalization and dimension-adaptive scaling, ensuring consistent and comparable similarity assessments regardless of dimensionality. Extensive experiments across diverse scenarios—including electromyographic synergy analysis, PCA embedding, and clustering—demonstrate that DIEM significantly improves discriminative accuracy and result interpretability. Its coefficient of variation remains stable and substantially lower than that of cosine similarity and other baselines. DIEM establishes a robust, interpretable, and theoretically sound paradigm for comparing high-dimensional vectors.

Technology Category

Application Category

📝 Abstract

Advancements in computational power and hardware efficiency have enabled the tackling of increasingly complex and high-dimensional problems. While artificial intelligence (AI) achieved remarkable results, the interpretability of high-dimensional solutions remains challenging. A critical issue is the comparison of multidimensional quantities, which is essential in techniques like Principal Component Analysis (PCA), or k-means clustering. Common metrics such as cosine similarity, Euclidean distance, and Manhattan distance are often used for such comparisons - for example in muscular synergies of the human motor control system. However, their applicability and interpretability diminish as dimensionality increases. This paper provides a comprehensive analysis of the effects of dimensionality on these metrics. Our results reveal significant limitations of cosine similarity, particularly its dependency on the dimensionality of the vectors, leading to biased and poorly interpretable outcomes. To address this, we introduce the Dimension Insensitive Euclidean Metric (DIEM) which demonstrates superior robustness and generalizability across dimensions. DIEM maintains consistent variability and eliminates the biases observed in traditional metrics, making it a reliable tool for high-dimensional comparisons. This novel metric has the potential to replace cosine similarity, providing a more accurate and insightful method to analyze multidimensional data in fields ranging from neuromotor control to machine and deep learning.

Problem

Research questions and friction points this paper is trying to address.

Addresses limitations of cosine similarity in high-dimensional data comparisons.

Introduces Dimension Insensitive Euclidean Metric (DIEM) for robust multidimensional analysis.

Improves interpretability and accuracy in high-dimensional data across various fields.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Dimension Insensitive Euclidean Metric (DIEM)

DIEM eliminates biases in high-dimensional comparisons

DIEM provides consistent variability across dimensions

🔎 Similar Papers

Metric Space Magnitude for Evaluating the Diversity of Latent Representations