Towards Fair Representation: Clustering and Consensus

📅 2025-06-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper formally defines and studies the fair consensus clustering problem: aggregating multiple input clusterings—each constructed solely on non-sensitive attributes—while ensuring the final clustering both preserves the collective data structure and satisfies demographic fairness, i.e., each protected group appears in every cluster proportionally to its global prevalence. As the problem is NP-hard, we propose the first polynomial-time algorithm with a constant-factor approximation guarantee. We further design an optimal post-processing correction framework that enforces fairness at minimal adjustment cost. Specifically, for the equal-group-proportion setting, we provide an exact optimal algorithm; for two groups with arbitrary proportions, we achieve a constant-factor approximation; and we establish the first theoretically provable fairness-aware approximation guarantee for consensus clustering.

Technology Category

Application Category

📝 Abstract
Consensus clustering, a fundamental task in machine learning and data analysis, aims to aggregate multiple input clusterings of a dataset, potentially based on different non-sensitive attributes, into a single clustering that best represents the collective structure of the data. In this work, we study this fundamental problem through the lens of fair clustering, as introduced by Chierichetti et al. [NeurIPS'17], which incorporates the disparate impact doctrine to ensure proportional representation of each protected group in the dataset within every cluster. Our objective is to find a consensus clustering that is not only representative but also fair with respect to specific protected attributes. To the best of our knowledge, we are the first to address this problem and provide a constant-factor approximation. As part of our investigation, we examine how to minimally modify an existing clustering to enforce fairness -- an essential postprocessing step in many clustering applications that require fair representation. We develop an optimal algorithm for datasets with equal group representation and near-linear time constant factor approximation algorithms for more general scenarios with different proportions of two group sizes. We complement our approximation result by showing that the problem is NP-hard for two unequal-sized groups. Given the fundamental nature of this problem, we believe our results on Closest Fair Clustering could have broader implications for other clustering problems, particularly those for which no prior approximation guarantees exist for their fair variants.
Problem

Research questions and friction points this paper is trying to address.

Develop fair consensus clustering with protected attributes
Modify existing clustering to enforce fair representation
Provide approximation algorithms for unequal group sizes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fair consensus clustering with protected attributes
Optimal algorithm for equal group representation
Near-linear time approximation for general scenarios
🔎 Similar Papers
No similar papers found.
Diptarka Chakraborty
Diptarka Chakraborty
School of Computing, National University of Singapore
Theoretical Computer Science
K
Kushagra Chatterjee
National University of Singapore
D
Debarati Das
Pennsylvania State University
T
Tien Long Nguyen
Pennsylvania State University
R
Romina Nobahari
Sharif Institute of Technology