π€ AI Summary
This paper addresses the problem of minority group diversity representation in approval-based committee elections under incomplete or noisy preference information, with diversity measured by maximum coverage. We establish tight upper and lower bounds on query complexity for this setting. To overcome theoretical limitations of non-adaptive approaches, we propose a robust approximation algorithm that integrates adaptive greedy selection with local search. Theoretically, we prove that, with high probability, our algorithm achieves a $(1-1/e)$-approximation to the optimal solution using the minimal number of queries. Empirically, on real-world Polis data, the algorithm significantly outperforms its theoretical guarantee and demonstrates strong robustness against two canonical types of noiseβnamely, label noise and preference incompleteness. Our work thus advances both the theoretical understanding and practical deployment of diversity-aware committee selection under imperfect preference elicitation.
π Abstract
We study diversity in approval-based committee elections with incomplete or inaccurate information. As standard in the literature on approval-based multi-winner voting, we define diversity according to the maximum coverage problem, which is known to be NP-complete, with a best attainable polynomial time approximation ratio of $1-1/e$. In the incomplete information model, voters can vote on only a small portion of the candidates. We suggest a greedy algorithm and a local search algorithm that query voters and use the query responses to approximate the total population's opinion. For both algorithms, we prove an upper bound on the number of queries required to get a close to $(1-1/e)$-approximate solution with high probability. We also provide a lower bound for the query complexity of non-adaptive algorithms, that cannot adapt their querying strategy to readily obtained information. In the inaccurate information setting, voters' responses are corrupted with a probability $pin(0,frac{1}{2})$. We provide both an upper and a lower bound for the number of queries required to attain a $(1-1/e)$-approximate solution with high probability. Finally, using real data from Polis, we see that our algorithms perform remarkably better than the theoretical results suggest, both with incomplete and inaccurate information.