🤖 AI Summary
Robust comparison of high-dimensional positive measures in the presence of outliers and noise remains challenging.
Method: This paper proposes the Sliced Unbalanced Optimal Transport (SUOT) framework, introducing two novel sliced unbalanced OT loss functions. The method employs a Frank–Wolfe-type optimization algorithm, reducing computational complexity to linear time while enhancing stability in high dimensions and robustness to outliers.
Contribution/Results: We establish the topological structure, statistical consistency, and convergence properties of the proposed losses. SUOT unifies and generalizes both sliced OT and unbalanced OT—two dominant paradigms in modern optimal transport. Extensive experiments on synthetic and real-world datasets demonstrate that SUOT outperforms standard OT and its variants in both computational efficiency and robustness, particularly under contamination and high-dimensional settings.
📝 Abstract
Optimal transport (OT) has emerged as a powerful framework to compare probability measures, a fundamental task in many statistical and machine learning problems. Substantial advances have been made over the last decade in designing OT variants which are either computationally and statistically more efficient, or more robust to the measures and datasets to compare. Among them, sliced OT distances have been extensively used to mitigate optimal transport's cubic algorithmic complexity and curse of dimensionality. In parallel, unbalanced OT was designed to allow comparisons of more general positive measures, while being more robust to outliers. In this paper, we propose to combine these two concepts, namely slicing and unbalanced OT, to develop a general framework for efficiently comparing positive measures. We propose two new loss functions based on the idea of slicing unbalanced OT, and study their induced topology and statistical properties. We then develop a fast Frank-Wolfe-type algorithm to compute these loss functions, and show that the resulting methodology is modular as it encompasses and extends prior related work. We finally conduct an empirical analysis of our loss functions and methodology on both synthetic and real datasets, to illustrate their relevance and applicability.