Magnitude Distance: A Geometric Measure of Dataset Similarity

📅 2026-02-09

📈 Citations: 0

✨ Influential: 0

career value

238K/year

🤖 AI Summary

This work proposes a novel distance measure, termed magnitude distance, grounded in the theory of magnitude from metric geometry, to address the failure of traditional distance metrics in high-dimensional data due to the curse of dimensionality. By incorporating an adjustable scale parameter $ t $, the method adaptively captures either global structure or local detail of the data. It is the first to apply magnitude to similarity measurement between datasets, satisfying the metric axioms under specific conditions and demonstrating strong discriminative power in high dimensions. Theoretical analysis reveals its intrinsic multiscale nature, while empirical experiments confirm its effectiveness in high-dimensional settings and its viability as a training objective for generative models.

Technology Category

Application Category

📝 Abstract

Quantifying the distance between datasets is a fundamental question in mathematics and machine learning. We propose \textit{magnitude distance}, a novel distance metric defined on finite datasets using the notion of the \emph{magnitude} of a metric space. The proposed distance incorporates a tunable scaling parameter, $t$, that controls the sensitivity to global structure (small $t$) and finer details (large $t$). We prove several theoretical properties of magnitude distance, including its limiting behavior across scales and conditions under which it satisfies key metric properties. In contrast to classical distances, we show that magnitude distance remains discriminative in high-dimensional settings when the scale is appropriately tuned. We further demonstrate how magnitude distance can be used as a training objective for push-forward generative models. Our experimental results support our theoretical analysis and demonstrate that magnitude distance provides meaningful signals, comparable to established distance-based generative approaches.

Problem

Research questions and friction points this paper is trying to address.

dataset similarity

distance metric

high-dimensional data

magnitude

metric space

Innovation

Methods, ideas, or system contributions that make the work stand out.

magnitude distance

metric space magnitude

dataset similarity