Generalizing Fair Clustering to Multiple Groups: Algorithms and Applications

📅 2025-11-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work studies multi-group fair clustering—ensuring representational fairness across arbitrarily many protected groups defined by attributes such as age, race, and gender. Addressing the limitation of classical “closest fair clustering,” which only handles two groups, we generalize the problem to an arbitrary number of groups and prove it is NP-hard. We propose the first near-linear-time approximation algorithm achieving the optimal approximation ratio. Our framework unifies fair clustering, fair correlation clustering, and fair consensus clustering; for the latter two, we provide the first polynomial-time approximation algorithms with provable theoretical guarantees. Our results resolve an open problem posed at COLT’25, significantly advancing both the theoretical foundations and practical applicability of multi-group fair clustering.

Technology Category

Application Category

📝 Abstract
Clustering is a fundamental task in machine learning and data analysis, but it frequently fails to provide fair representation for various marginalized communities defined by multiple protected attributes -- a shortcoming often caused by biases in the training data. As a result, there is a growing need to enhance the fairness of clustering outcomes, ideally by making minimal modifications, possibly as a post-processing step after conventional clustering. Recently, Chakraborty et al. [COLT'25] initiated the study of emph{closest fair clustering}, though in a restricted scenario where data points belong to only two groups. In practice, however, data points are typically characterized by many groups, reflecting diverse protected attributes such as age, ethnicity, gender, etc. In this work, we generalize the study of the emph{closest fair clustering} problem to settings with an arbitrary number (more than two) of groups. We begin by showing that the problem is NP-hard even when all groups are of equal size -- a stark contrast with the two-group case, for which an exact algorithm exists. Next, we propose near-linear time approximation algorithms that efficiently handle arbitrary-sized multiple groups, thereby answering an open question posed by Chakraborty et al. [COLT'25]. Leveraging our closest fair clustering algorithms, we further achieve improved approximation guarantees for the emph{fair correlation clustering} problem, advancing the state-of-the-art results established by Ahmadian et al. [AISTATS'20] and Ahmadi et al. [2020]. Additionally, we are the first to provide approximation algorithms for the emph{fair consensus clustering} problem involving multiple (more than two) groups, thus addressing another open direction highlighted by Chakraborty et al. [COLT'25].
Problem

Research questions and friction points this paper is trying to address.

Extends closest fair clustering to handle multiple protected groups
Develops efficient approximation algorithms for multi-group fairness
Advances fair correlation and consensus clustering for multiple groups
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalized closest fair clustering to multiple groups
Developed near-linear time approximation algorithms
Provided first approximation for fair consensus clustering
Diptarka Chakraborty
Diptarka Chakraborty
School of Computing, National University of Singapore
Theoretical Computer Science
K
Kushagra Chatterjee
National University of Singapore
D
Debarati Das
Pennsylvania State University
T
T. Nguyen
Pennsylvania State University