Cross-Fusion Distance: A Novel Metric for Measuring Fusion and Separability Between Data Groups in Representation Space

📅 2026-01-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing metrics struggle to distinguish geometric transformations that affect data cluster fusion from those—such as global scaling or sampling layout changes—that do not, leading to inaccurate quantification of fusion and separability in representation spaces. To address this, this work proposes the Cross-Fusion Distance (CFD), which, grounded in geometric invariance theory, theoretically disentangles and isolates fusion-relevant from fusion-irrelevant geometric factors for the first time, enabling precise measurement of fusion extent. CFD exhibits linear computational complexity and, in synthetic experiments, demonstrates high sensitivity to fusion-altering transformations while remaining invariant to irrelevant ones. Moreover, on real-world domain-shift datasets, CFD shows stronger correlation with downstream task generalization performance than existing metrics.

Technology Category

Application Category

📝 Abstract
Quantifying degrees of fusion and separability between data groups in representation space is a fundamental problem in representation learning, particularly under domain shift. A meaningful metric should capture fusion-altering factors like geometric displacement between representation groups, whose variations change the extent of fusion, while remaining invariant to fusion-preserving factors such as global scaling and sampling-induced layout changes, whose variations do not. Existing distributional distance metrics conflate these factors, leading to measures that are not informative of the true extent of fusion between data groups. We introduce Cross-Fusion Distance (CFD), a principled measure that isolates fusion-altering geometry while remaining robust to fusion-preserving variations, with linear computational complexity. We characterize the invariance and sensitivity properties of CFD theoretically and validate them in controlled synthetic experiments. For practical utility on real-world datasets with domain shift, CFD aligns more closely with downstream generalization degradation than commonly used alternatives. Overall, CFD provides a theoretically grounded and interpretable distance measure for representation learning.
Problem

Research questions and friction points this paper is trying to address.

representation learning
domain shift
fusion
separability
distributional distance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-Fusion Distance
representation learning
domain shift
distributional distance
fusion-separability metric
🔎 Similar Papers
No similar papers found.
Xiaolong Zhang
Xiaolong Zhang
Shenzhen Institute of Advanced Technology
electrocatalysisnanomaterialsbatteryelectrochemical energy conversionCO2 reduction
J
Jianwei Zhang
Brenden-Colson Center for Pancreatic Care, Oregon Health and Science University, OR, USA
X
Xubo Song
CEDAR, Knight Cancer Institute, Oregon Health and Science University, OR, USA