Fair Diversity Maximization with Few Representatives

📅 2025-06-09

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This paper studies the fairness-constrained diversity maximization problem: selecting $k$ representative points from a multi-label dataset such that at most a constant number of samples is chosen from each label class, while maximizing the minimum pairwise distance. For small-scale cross-group representative selection, we propose the first three-stage framework—pre-pruning, padded decomposition, and label-aware assignment. We introduce a distance-driven pre-pruning step to improve efficiency; employ randomized padded decomposition to balance intra-group sparsity and global diversity; and design a label-aware cluster assignment mechanism to ensure fair label coverage. We theoretically establish an approximation ratio of $Omega(sqrt{log m}/m)$, significantly improving upon prior methods. Experiments on large-scale datasets demonstrate that our approach achieves state-of-the-art performance in both minimum pairwise distance and label coverage, while strictly satisfying fairness constraints.

Technology Category

Application Category

📝 Abstract

Diversity maximization problem is a well-studied problem where the goal is to find $k$ diverse items. Fair diversity maximization aims to select a diverse subset of $k$ items from a large dataset, while requiring that each group of items be well represented in the output. More formally, given a set of items with labels, our goal is to find $k$ items that maximize the minimum pairwise distance in the set, while maintaining that each label is represented within some budget. In many cases, one is only interested in selecting a handful (say a constant) number of items from each group. In such scenario we show that a randomized algorithm based on padded decompositions improves the state-of-the-art approximation ratio to $sqrt{log(m)}/(3m)$, where $m$ is the number of labels. The algorithms work in several stages: ($i$) a preprocessing pruning which ensures that points with the same label are far away from each other, ($ii$) a decomposition phase, where points are randomly placed in clusters such that there is a feasible solution with maximum one point per cluster and that any feasible solution will be diverse, $(iii)$ assignment phase, where clusters are assigned to labels, and a representative point with the corresponding label is selected from each cluster. We experimentally verify the effectiveness of our algorithm on large datasets.

Problem

Research questions and friction points this paper is trying to address.

Maximize diversity while ensuring fair group representation

Select diverse items with label-specific budget constraints

Improve approximation ratio for few representatives per group

Innovation

Methods, ideas, or system contributions that make the work stand out.

Preprocessing pruning for label distance

Randomized padded decomposition clustering

Cluster-label assignment with diversity

🔎 Similar Papers

A Survey on Group Fairness in Federated Learning: Challenges, Taxonomy of Solutions and Directions for Future Research