Coloring for dispersion: A polynomial-time algorithm for cardinality-constrained 2-anticlustering

📅 2026-04-27
📈 Citations: 0
Influential: 0
📄 PDF

career value

211K/year
🤖 AI Summary
This study addresses the 2-maximum dispersion problem with cardinality constraints (2-MDCC), which seeks to partition a set of elements into two subsets of prescribed sizes such that the minimum pairwise dissimilarity within each subset is maximized. We establish, for the first time, that this problem is solvable in polynomial time by reducing it to a polynomial number of cardinality-constrained 2-coloring instances, which are further transformed into restricted subset-sum problems. Leveraging this reduction, we design an efficient pseudo-polynomial dynamic programming algorithm. Our open-source implementation outperforms existing integer linear programming approaches by several orders of magnitude, solving large-scale instances with tens of thousands of data points in under one second, thereby resolving a long-standing open question regarding the computational complexity of 2-MDCC.

Technology Category

Application Category

📝 Abstract
The $k$-Maximum Dispersion Problem with Cardinality Constraints ($k$-MDCC) asks for a partition of a given item set with pairwise dissimilarities into $k$ cardinality-constrained groups such that the minimum pairwise intra-group dissimilarity, which is also known as the dispersion, is maximized. The problem arises in the context of anticlustering, where the goal is to create maximally heterogeneous groups of items with applications in psychological research, bioinformatics, and data science. It is known that $k$-MDCC is NP-hard for $k \geq 3$ but it has been an open question whether it can be solved in polynomial time for $k = 2$. We give a positive answer to this question by showing that $2$-MDCC can be solved by a quadratic number of cardinality-constrained 2-coloring problem instances ($2$-COLCC). We solve these instances by transforming them into a restricted class of subset sum instances. Although subset sum is NP-complete in general, for this restricted class the input values are bounded, ensuring that the pseudopolynomial dynamic programming algorithm runs in polynomial time. As a consequence, we obtain a polynomial-time algorithm for $2$-MDCC. We demonstrate that a publicly available open-source implementation of our new algorithm outperforms the previous integer linear programming solution by several orders of magnitude so that even large datasets ($n = 10{,}000$) can be processed in less than a second.
Problem

Research questions and friction points this paper is trying to address.

Maximum Dispersion
Cardinality Constraints
Anticlustering
2-Partitioning
Computational Complexity
Innovation

Methods, ideas, or system contributions that make the work stand out.

polynomial-time algorithm
2-anticlustering
cardinality-constrained coloring
subset sum
maximum dispersion
🔎 Similar Papers
2024-05-13Proceedings of the ACM Symposium on Principles of Distributed ComputingCitations: 1
2024-03-11Workshop on Approximation and Online AlgorithmsCitations: 1
2024-08-29INFORMS journal on computingCitations: 0
N
Nguyen Khoa Tran
Department of Computer Science, Heinrich Heine University Düsseldorf, Germany; Center for Digital Medicine, Düsseldorf, Germany
L
Lin Mu
Department of Computer Science, Heinrich Heine University Düsseldorf, Germany
M
Martin Papenberg
Department of Experimental Psychology, Heinrich Heine University Düsseldorf, Germany; Center for Digital Medicine, Düsseldorf, Germany
Gunnar W. Klau
Gunnar W. Klau
Heinrich Heine University Düsseldorf
BioinformaticsComputational BiologyCombinatorial AlgorithmsAlgorithm EngineeringGraph Algorithms