🤖 AI Summary
Traditional clustering methods struggle to ensure proportional fairness for subgroups in large-scale datasets, and verifying existing fairness criteria such as mPJR is coNP-hard. This work proposes a novel proportionality criterion, DC-mPJR+, which restricts the coalition set to enable efficient verification while establishing a theoretical connection to mPJR+: any γ-DC-mPJR+ solution automatically satisfies (γ+2)-mPJR+. Leveraging submodular function minimization and clustering algorithms like SEAR, the authors design a verification algorithm with time complexity O(mn log n + mnk). This approach achieves, for the first time, a practical balance between theoretical fairness guarantees and computational verifiability, offering a viable auditing pathway for fair clustering.
📝 Abstract
Popular centroid-based clustering methods are typically optimized for global objectives and may fail to adequately represent large groups of datapoints. To address this concern, recent work puts forward clustering analogs of social choice proportionality concepts, such as Proportionally Representative Fairness (also known as mPJR). For proportionality guarantees to be useful in practice, they must be (a) achievable and (b) efficiently auditable, so that one can check whether standard approaches, such as $k$-means, which are not guaranteed to provide proportional representation in general, nevertheless output proportional solutions on specific inputs. In this work, we study the computational complexity of verifying proportional representation in clustering. We first show that verifying mPJR is coNP-hard. Inspired by PJR+ -- a strengthening of PJR that is polynomial-time verifiable in the committee voting setting -- we introduce mPJR+ as its metric analog. However, verifying mPJR+ relies on repeated submodular minimization, rendering it impractical at scale. Hence, we introduce Default Coalitions mPJR+ (DC-mPJR+): a new proportionality concept that offers representation guarantees to a restricted set of coalitions around unselected centers, and as a result, admits an $O(mn \log n + mnk)$ verification algorithm. DC-mPJR+ is satisfied by SEAR and remains a meaningful proxy for global fairness: any solution satisfying $γ$-DC-mPJR+ also satisfies $(γ+ 2)$-mPJR+. Together, our results identify a practical and theoretically grounded path for auditing proportional representation in clustering.