Transport Clustering: Solving Low-Rank Optimal Transport via Clustering

📅 2026-03-03

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work addresses the challenge of low-rank optimal transport (OT), which, despite its ability to reveal latent data structures and enhance statistical stability, is notoriously non-convex and NP-hard. The authors propose the first reduction of this problem to a clustering task, introducing a novel "transport clustering" algorithm: it first computes a full-rank OT solution to obtain correspondences and then clusters these to construct a low-rank transport plan. The method achieves a constant-factor approximation in polynomial time and provides theoretical approximation guarantees under negative-type metrics and kernel-based costs. Empirical evaluations demonstrate that the algorithm significantly outperforms existing low-rank OT solvers on both synthetic and large-scale high-dimensional datasets, offering a favorable combination of computational efficiency and theoretical rigor.

Technology Category

Application Category

📝 Abstract

Optimal transport (OT) finds a least cost transport plan between two probability distributions using a cost matrix defined on pairs of points. Unlike standard OT, which infers unstructured pointwise mappings, low-rank optimal transport explicitly constrains the rank of the transport plan to infer latent structure. This improves statistical stability and robustness, yields sharper parametric rates for estimating Wasserstein distances adaptive to the intrinsic rank, and generalizes $K$-means to co-clustering. These advantages, however, come at the cost of a non-convex and NP-hard optimization problem. We introduce transport clustering, an algorithm to compute a low-rank OT plan that reduces low-rank OT to a clustering problem on correspondences obtained from a full-rank $\textit{transport registration}$ step. We prove that this reduction yields polynomial-time, constant-factor approximation algorithms for low-rank OT: specifically, a $(1+γ)$ approximation for negative-type metrics and a $(1+γ+\sqrt{2γ}\,)$ approximation for kernel costs, where $γ\in [0,1]$ denotes the approximation ratio of the optimal full-rank solution relative to the low-rank optimal. Empirically, transport clustering outperforms existing low-rank OT solvers on synthetic benchmarks and large-scale, high-dimensional datasets.

Problem

Research questions and friction points this paper is trying to address.

low-rank optimal transport

non-convex optimization

NP-hard problem

optimal transport

transport plan

Innovation

Methods, ideas, or system contributions that make the work stand out.

low-rank optimal transport

transport clustering

co-clustering