๐ค AI Summary
This work presents the first unified treatment of centroid-based and non-centroid-based notions of proportional fairness in clustering. It introduces a โsemi-centroid clusteringโ framework that integrates both types of loss functions and systematically investigates the core and fully justified representation (FJR) fairness criteria. Leveraging combinatorial optimization and approximation algorithms, the authors design a polynomial-time constant-factor approximation algorithm for the core under general settings. Under restricted loss functions, they achieve stronger approximation guarantees for the FJR criterion and establish corresponding theoretical lower bounds. This study provides a unified modeling approach and efficient algorithms for fair clustering across diverse distance metrics.
๐ Abstract
Proportional fairness criteria inspired by democratic ideals of proportional representation have received growing attention in the clustering literature. Prior work has investigated them in two separate paradigms. Chen et al. [ICML 2019] study centroid clustering, in which each data point's loss is determined by its distance to a representative point (centroid) chosen in its cluster. Caragiannis et al. [NeurIPS 2024] study non-centroid clustering, in which each data point's loss is determined by its maximum distance to any other data point in its cluster. We generalize both paradigms to introduce semi-centroid clustering, in which each data point's loss is a combination of its centroid and non-centroid losses, and study two proportional fairness criteria -- the core and, its relaxation, fully justified representation (FJR). Our main result is a novel algorithm which achieves a constant approximation to the core, in polynomial time, even when the distance metrics used for centroid and non-centroid loss measurements are different. We also derive improved results for more restricted loss functions and the weaker FJR criterion, and establish lower bounds in each case.