A Polynomial-Time Approximation for Pairwise Fair k-Median Clustering

📅 2024-05-16
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies the pairwise fair $k$-median clustering problem with $ell > 2$ demographic groups: each cluster must satisfy strict fairness constraints—namely, the ratio of points from any two groups within a cluster is bounded by a given threshold $t$. Prior work only achieved bicriteria approximations or required exponential time. We present the first polynomial-time algorithm that *exactly* satisfies the fairness constraints while attaining an $O(k^2 ell t)$-approximation guarantee—resolving a key open problem. Furthermore, we establish a tight NP-hardness lower bound, showing that for $ell = 2$, the problem is as hard to approximate as the capacitated $k$-median problem. Our approach integrates Lagrangian relaxation, combinatorial optimization techniques, and hardness reductions. This yields the first scalable algorithm for multi-group fair clustering with rigorous theoretical guarantees—bridging a critical gap between practical fairness requirements and computational tractability.

Technology Category

Application Category

📝 Abstract
In this work, we study pairwise fair clustering with $ell ge 2$ groups, where for every cluster $C$ and every group $i in [ell]$, the number of points in $C$ from group $i$ must be at most $t$ times the number of points in $C$ from any other group $j in [ell]$, for a given integer $t$. To the best of our knowledge, only bi-criteria approximation and exponential-time algorithms follow for this problem from the prior work on fair clustering problems when $ell>2$. In our work, focusing on the $ell>2$ case, we design the first polynomial-time $O(k^2cdot ell cdot t)$-approximation for this problem with $k$-median cost that does not violate the fairness constraints. We complement our algorithmic result by providing hardness of approximation results, which show that our problem even when $ell=2$ is almost as hard as the popular uniform capacitated $k$-median, for which no polynomial-time algorithm with an approximation factor of $o(log k)$ is known.
Problem

Research questions and friction points this paper is trying to address.

Polynomial-time approximation for clustering
Fairness constraints in k-median clustering
Handling multiple groups in clustering algorithms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Polynomial-time approximation algorithm
Fair clustering with multiple groups
K-median cost optimization