Diversity-aware clustering: Computational Complexity and Approximation Algorithms

📅 2024-01-10

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This paper studies diversity-aware clustering, where data points possess multiple attributes causing group overlaps, and each group must select a number of cluster centers within prescribed lower and upper bounds, while minimizing the $k$-median, $k$-means, or $k$-supplier objective. We first establish the problem’s computational complexity: it is NP-hard, W[1]-hard with respect to natural parameters, and—under the Gap-ETH assumption—admits tight inapproximability bounds. We then design parameterized approximation algorithms achieving tight approximation ratios of 1.736, 3.943, and 5 for the $k$-median, $k$-means, and $k$-supplier variants, respectively. Our results extend fairness-aware clustering theory from the restrictive disjoint-group setting to the more realistic overlapping-group scenario. To the best of our knowledge, this work provides the first solution that simultaneously achieves theoretical tightness and broad model applicability for clustering under diversity constraints.

Technology Category

Application Category

📝 Abstract

In this work, we study diversity-aware clustering problems where the data points are associated with multiple attributes resulting in intersecting groups. A clustering solution needs to ensure that the number of chosen cluster centers from each group should be within the range defined by a lower and upper bound threshold for each group, while simultaneously minimizing the clustering objective, which can be either $k$-median, $k$-means or $k$-supplier. We study the computational complexity of the proposed problems, offering insights into their NP-hardness, polynomial-time inapproximability, and fixed-parameter intractability. We present parameterized approximation algorithms with approximation ratios $1+ frac{2}{e} + epsilon approx 1.736$, $1+frac{8}{e} + epsilon approx 3.943$, and $5$ for diversity-aware $k$-median, diversity-aware $k$-means and diversity-aware $k$-supplier, respectively. Assuming Gap-ETH, the approximation ratios are tight for the diversity-aware $k$-median and diversity-aware $k$-means problems. Our results imply the same approximation factors for their respective fair variants with disjoint groups -- fair $k$-median, fair $k$-means, and fair $k$-supplier -- with lower bound requirements.

Problem

Research questions and friction points this paper is trying to address.

Study diversity-aware clustering with intersecting groups and bounds

Analyze NP-hardness and inapproximability of clustering objectives

Develop approximation algorithms for k-median, k-means, k-supplier

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diversity-aware clustering with intersecting group constraints

Parameterized approximation algorithms for NP-hard problems

Tight approximation ratios for k-median and k-means

🔎 Similar Papers

A Parametrizable Algorithm for Distributed Approximate Similarity Search with Arbitrary Distances