Round-efficient Fully-scalable MPC algorithms for k-Means

📅 2026-04-01

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This work addresses the problem of efficiently approximating Euclidean k-Means in the Massively Parallel Computation (MPC) model. It presents the first constant-round, fully scalable algorithm that overcomes the longstanding O(log n) approximation barrier for k-Median imposed by prior tree embedding techniques. By introducing a Lagrangian multiplier preserving (LMP) property tolerant to arbitrary distance distortion, the proposed method achieves an O((log n / log log n)²) approximation for k-Means and an O(log n / log log n) approximation for k-Median—all within O(1) rounds. Moreover, it guarantees a constant-factor approximation of the objective function value, significantly advancing both theoretical guarantees and practical parallel efficiency for clustering in the MPC setting.

Technology Category

Application Category

📝 Abstract

We study Euclidean $k$-Means under the Massively Parallel Computation (MPC) model, focusing on the \emph{fully-scalable} setting. Our main result is a fully-scalable $O((\log n/\log\log n)^2)$-approximation in $O(1)$ rounds. Previously, fully-scalable algorithms for $k$-Means either run in super-constant $O(\log\log n \cdot \log\log\log n)$ rounds, albeit with a better $O(1)$-approximation [Cohen-Addad et al., SODA'26], or suffer from bicriteria guarantees [Bhaskara and Wijewardena, ICML'18; Czumaj et al., ICALP'24]. Our algorithm also gives an $O(\log n/\log\log n)$-approximation for $k$-Median, which improves a recent $O(\log n)$-approximation [Goranci et al., SODA'26], and this $o(\log n)$ ratio breaks the fundamental barrier of tree embedding methods used therein. Our main technical contribution is a new variant of the MP algorithm [Mettu and Plaxton, SICOMP'03] that works for general metrics, whose new guarantee is the Lagrangian Multiplier Preserving (LMP) property, which, importantly, holds even under arbitrary distance distortions. Allowing distance distortion is crucial for efficient MPC implementations and useful for efficient algorithm design in general, whereas preserving the LMP property under distance distortion is known to be a significant technical challenge. As a byproduct of our techniques, we also obtain an $O(1)$-approximation to the optimal \emph{value} in $O(1)$ rounds, which conceptually suggests that achieving a true $O(1)$-approximation (for the solution) in $O(1)$ rounds may be a sensible goal for future study.

Problem

Research questions and friction points this paper is trying to address.

k-Means

Massively Parallel Computation

fully-scalable

round complexity

approximation algorithm

Innovation

Methods, ideas, or system contributions that make the work stand out.

Massively Parallel Computation

k-Means clustering

Lagrangian Multiplier Preserving