Magnitude Distance: A Geometric Measure of Dataset Similarity

📅 2026-02-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a novel distance measure, termed magnitude distance, grounded in the theory of magnitude from metric geometry, to address the failure of traditional distance metrics in high-dimensional data due to the curse of dimensionality. By incorporating an adjustable scale parameter \( t \), the method adaptively captures either global structure or local detail of the data. It is the first to apply magnitude to similarity measurement between datasets, satisfying the metric axioms under specific conditions and demonstrating strong discriminative power in high dimensions. Theoretical analysis reveals its intrinsic multiscale nature, while empirical experiments confirm its effectiveness in high-dimensional settings and its viability as a training objective for generative models.

Technology Category

Application Category

📝 Abstract
Quantifying the distance between datasets is a fundamental question in mathematics and machine learning. We propose \textit{magnitude distance}, a novel distance metric defined on finite datasets using the notion of the \emph{magnitude} of a metric space. The proposed distance incorporates a tunable scaling parameter, $t$, that controls the sensitivity to global structure (small $t$) and finer details (large $t$). We prove several theoretical properties of magnitude distance, including its limiting behavior across scales and conditions under which it satisfies key metric properties. In contrast to classical distances, we show that magnitude distance remains discriminative in high-dimensional settings when the scale is appropriately tuned. We further demonstrate how magnitude distance can be used as a training objective for push-forward generative models. Our experimental results support our theoretical analysis and demonstrate that magnitude distance provides meaningful signals, comparable to established distance-based generative approaches.
Problem

Research questions and friction points this paper is trying to address.

dataset similarity
distance metric
high-dimensional data
magnitude
metric space
Innovation

Methods, ideas, or system contributions that make the work stand out.

magnitude distance
metric space magnitude
dataset similarity
scale-tunable distance
generative modeling