Coloring for dispersion: A polynomial-time algorithm for cardinality-constrained 2-anticlustering

📅 2026-04-27

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This study addresses the 2-maximum dispersion problem with cardinality constraints (2-MDCC), which seeks to partition a set of elements into two subsets of prescribed sizes such that the minimum pairwise dissimilarity within each subset is maximized. We establish, for the first time, that this problem is solvable in polynomial time by reducing it to a polynomial number of cardinality-constrained 2-coloring instances, which are further transformed into restricted subset-sum problems. Leveraging this reduction, we design an efficient pseudo-polynomial dynamic programming algorithm. Our open-source implementation outperforms existing integer linear programming approaches by several orders of magnitude, solving large-scale instances with tens of thousands of data points in under one second, thereby resolving a long-standing open question regarding the computational complexity of 2-MDCC.

Technology Category

Application Category

📝 Abstract

The $k$-Maximum Dispersion Problem with Cardinality Constraints ($k$-MDCC) asks for a partition of a given item set with pairwise dissimilarities into $k$ cardinality-constrained groups such that the minimum pairwise intra-group dissimilarity, which is also known as the dispersion, is maximized. The problem arises in the context of anticlustering, where the goal is to create maximally heterogeneous groups of items with applications in psychological research, bioinformatics, and data science. It is known that $k$-MDCC is NP-hard for $k \geq 3$ but it has been an open question whether it can be solved in polynomial time for $k = 2$. We give a positive answer to this question by showing that $2$-MDCC can be solved by a quadratic number of cardinality-constrained 2-coloring problem instances ($2$-COLCC). We solve these instances by transforming them into a restricted class of subset sum instances. Although subset sum is NP-complete in general, for this restricted class the input values are bounded, ensuring that the pseudopolynomial dynamic programming algorithm runs in polynomial time. As a consequence, we obtain a polynomial-time algorithm for $2$-MDCC. We demonstrate that a publicly available open-source implementation of our new algorithm outperforms the previous integer linear programming solution by several orders of magnitude so that even large datasets ($n = 10{,}000$) can be processed in less than a second.

Problem

Research questions and friction points this paper is trying to address.

Maximum Dispersion

Cardinality Constraints

Anticlustering

2-Partitioning

Computational Complexity

Innovation

Methods, ideas, or system contributions that make the work stand out.

polynomial-time algorithm

2-anticlustering

cardinality-constrained coloring