Towards Fair Representation: Clustering and Consensus

📅 2025-06-10

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

This paper formally defines and studies the fair consensus clustering problem: aggregating multiple input clusterings—each constructed solely on non-sensitive attributes—while ensuring the final clustering both preserves the collective data structure and satisfies demographic fairness, i.e., each protected group appears in every cluster proportionally to its global prevalence. As the problem is NP-hard, we propose the first polynomial-time algorithm with a constant-factor approximation guarantee. We further design an optimal post-processing correction framework that enforces fairness at minimal adjustment cost. Specifically, for the equal-group-proportion setting, we provide an exact optimal algorithm; for two groups with arbitrary proportions, we achieve a constant-factor approximation; and we establish the first theoretically provable fairness-aware approximation guarantee for consensus clustering.

Technology Category

Application Category

📝 Abstract

Consensus clustering, a fundamental task in machine learning and data analysis, aims to aggregate multiple input clusterings of a dataset, potentially based on different non-sensitive attributes, into a single clustering that best represents the collective structure of the data. In this work, we study this fundamental problem through the lens of fair clustering, as introduced by Chierichetti et al. [NeurIPS'17], which incorporates the disparate impact doctrine to ensure proportional representation of each protected group in the dataset within every cluster. Our objective is to find a consensus clustering that is not only representative but also fair with respect to specific protected attributes. To the best of our knowledge, we are the first to address this problem and provide a constant-factor approximation. As part of our investigation, we examine how to minimally modify an existing clustering to enforce fairness -- an essential postprocessing step in many clustering applications that require fair representation. We develop an optimal algorithm for datasets with equal group representation and near-linear time constant factor approximation algorithms for more general scenarios with different proportions of two group sizes. We complement our approximation result by showing that the problem is NP-hard for two unequal-sized groups. Given the fundamental nature of this problem, we believe our results on Closest Fair Clustering could have broader implications for other clustering problems, particularly those for which no prior approximation guarantees exist for their fair variants.

Problem

Research questions and friction points this paper is trying to address.

Develop fair consensus clustering with protected attributes

Modify existing clustering to enforce fair representation

Provide approximation algorithms for unequal group sizes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fair consensus clustering with protected attributes

Optimal algorithm for equal group representation

Near-linear time approximation for general scenarios

🔎 Similar Papers

A Survey on Group Fairness in Federated Learning: Challenges, Taxonomy of Solutions and Directions for Future Research