Streaming and Massively Parallel Algorithms for Euclidean Max-Cut

📅 2025-03-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies the Euclidean Max-Cut problem: given a set of vectors $X = {x_1,dots,x_n} subset mathbb{R}^d$, partition them into two subsets to maximize the sum of Euclidean distances across the cut. For large-scale, high-dimensional settings, we propose the first $(1+varepsilon)$-approximation algorithm supporting dynamic data streams, achieving sublinear space complexity $mathrm{poly}(d log Delta / varepsilon)$ and real-time oracle queries—resolving an open problem left by STOC’23. Additionally, we design a unified framework combining parallelization and subsampling that, in the Massively Parallel Computation (MPC) model, attains the same approximation guarantee in a constant number of rounds using total space $O(nd) + n cdot mathrm{poly}(log n / varepsilon)$. Key technical innovations include a Euclidean adaptation of the Mathieu–Schudy greedy algorithm, dynamic stream compression, and distance-sensitive subspace analysis—collectively enhancing scalability and practicality.

Technology Category

Application Category

📝 Abstract
Given a set of vectors $X = { x_1,dots, x_n } subset mathbb{R}^d$, the Euclidean max-cut problem asks to partition the vectors into two parts so as to maximize the sum of Euclidean distances which cross the partition. We design new algorithms for Euclidean max-cut in models for massive datasets: $ullet$ We give a fully-scalable constant-round MPC algorithm using $O(nd) + n cdot ext{poly}( log(n) / epsilon)$ total space which gives a $(1+epsilon)$-approximate Euclidean max-cut. $ullet$ We give a dynamic streaming algorithm using $ ext{poly}(d log Delta / epsilon)$ space when $X subseteq [Delta]^d$, which provides oracle access to a $(1+epsilon)$-approximate Euclidean max-cut. Recently, Chen, Jiang, and Krauthgamer $[ ext{STOC}~'23]$ gave a dynamic streaming algorithm with space $ ext{poly}(dlogDelta/epsilon)$ to approximate the value of the Euclidean max-cut, but could not provide oracle access to an approximately optimal cut. This was left open in that work, and we resolve it here. Both algorithms follow from the same framework, which analyzes a ``parallel'' and ``subsampled'' (Euclidean) version of a greedy algorithm of Mathieu and Schudy $[ ext{SODA}~'08]$ for dense max-cut.
Problem

Research questions and friction points this paper is trying to address.

Design scalable algorithms for Euclidean max-cut in massive datasets.
Provide dynamic streaming with oracle access to approximate max-cut.
Resolve open problem of oracle access in dynamic streaming algorithms.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fully-scalable MPC algorithm for Euclidean max-cut
Dynamic streaming algorithm with poly space
Oracle access to approximate Euclidean max-cut
🔎 Similar Papers