🤖 AI Summary
Traditional optimal transport (OT) barycenter estimation suffers from poor robustness under multi-source heterogeneous data contaminated by outliers and severe class imbalance.
Method: This paper proposes a robust OT barycenter estimation framework for continuous distributions. It introduces a novel semi-unbalanced OT dual formulation, explicitly modeling outlier robustness via partial mass relaxation among source distributions. A differentiable min-max optimization architecture is designed—compatible with arbitrary cost functions—and integrates neural OT solvers with dual optimization for end-to-end training.
Contributions/Results: We provide theoretical guarantees on convergence and statistical robustness. Experiments demonstrate that the method significantly outperforms existing OT barycenter approaches under noise corruption, outlier contamination, and extreme class imbalance, while maintaining high accuracy and scalability.
📝 Abstract
A common challenge in aggregating data from multiple sources can be formalized as an extit{Optimal Transport} (OT) barycenter problem, which seeks to compute the average of probability distributions with respect to OT discrepancies. However, the presence of outliers and noise in the data measures can significantly hinder the performance of traditional statistical methods for estimating OT barycenters. To address this issue, we propose a novel, scalable approach for estimating the extit{robust} continuous barycenter, leveraging the dual formulation of the extit{(semi-)unbalanced} OT problem. To the best of our knowledge, this paper is the first attempt to develop an algorithm for robust barycenters under the continuous distribution setup. Our method is framed as a $min$-$max$ optimization problem and is adaptable to extit{general} cost function. We rigorously establish the theoretical underpinnings of the proposed method and demonstrate its robustness to outliers and class imbalance through a number of illustrative experiments.