On Approximability of $ell_2^2$ Min-Sum Clustering

📅 2024-12-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies the approximation hardness and efficient algorithm design for $ell_2^2$ min-sum $k$-clustering—minimizing the sum of squared Euclidean distances within each cluster. First, it establishes the first NP-hardness-of-approximation lower bounds: $1.056$ for general instances and $1.327$ for high-dimensional sparse instances. Second, it proposes the first near-linear-time parameterized PTAS, running in $O(n^{1+o(1)} d cdot exp((k/varepsilon)^{O(1)}))$. Third, it introduces a novel learning-augmented framework that achieves a $frac{1+gammaalpha}{(1-alpha)^2}$-approximation guarantee under prediction error $alpha$, thereby surpassing classical approximation limits. The analysis integrates techniques from Johnson coverage-based hardness reductions, combinatorial optimization, and computational geometry to rigorously characterize the intrinsic complexity of the problem and establish a new paradigm for prior-informed clustering.

Technology Category

Application Category

📝 Abstract
The $ell_2^2$ min-sum $k$-clustering problem is to partition an input set into clusters $C_1,ldots,C_k$ to minimize $sum_{i=1}^ksum_{p,qin C_i}|p-q|_2^2$. Although $ell_2^2$ min-sum $k$-clustering is NP-hard, it is not known whether it is NP-hard to approximate $ell_2^2$ min-sum $k$-clustering beyond a certain factor. In this paper, we give the first hardness-of-approximation result for the $ell_2^2$ min-sum $k$-clustering problem. We show that it is NP-hard to approximate the objective to a factor better than $1.056$ and moreover, assuming a balanced variant of the Johnson Coverage Hypothesis, it is NP-hard to approximate the objective to a factor better than 1.327. We then complement our hardness result by giving a nearly linear time parameterized PTAS for $ell_2^2$ min-sum $k$-clustering running in time $Oleft(n^{1+o(1)}dcdot exp((kcdotvarepsilon^{-1})^{O(1)}) ight)$, where $d$ is the underlying dimension of the input dataset. Finally, we consider a learning-augmented setting, where the algorithm has access to an oracle that outputs a label $iin[k]$ for input point, thereby implicitly partitioning the input dataset into $k$ clusters that induce an approximately optimal solution, up to some amount of adversarial error $alphainleft[0,frac{1}{2} ight)$. We give a polynomial-time algorithm that outputs a $frac{1+gammaalpha}{(1-alpha)^2}$-approximation to $ell_2^2$ min-sum $k$-clustering, for a fixed constant $gamma>0$.
Problem

Research questions and friction points this paper is trying to address.

Hardness of approximating $ell_2^2$ min-sum $k$-clustering beyond certain factors
Nearly linear time PTAS for $ell_2^2$ min-sum $k$-clustering
Learning-augmented algorithm for adversarial error in clustering
Innovation

Methods, ideas, or system contributions that make the work stand out.

NP-hardness proof for approximation limits
Linear time parameterized PTAS solution
Learning-augmented algorithm with oracle access
🔎 Similar Papers
No similar papers found.