🤖 AI Summary
This work addresses the limited controllability of community size in conventional community detection methods, which typically rely on resolution parameters to indirectly influence average community size and thus struggle to satisfy domain-specific prior requirements on precise size ranges. Within the modularity optimization framework, we propose a novel approach that explicitly incorporates user-specified lower and upper bounds on community size directly into the optimization process. We introduce a heuristic algorithm implemented within the Leiden framework for scalability and efficiency, alongside an exact integer programming formulation serving as a benchmark. Experiments on both synthetic and real-world networks demonstrate that our method consistently yields high-modularity partitions adhering strictly to the prescribed size constraints, significantly outperforming traditional resolution-parameter tuning strategies. The implementation is publicly available as part of the Python Leiden algorithm package.
📝 Abstract
When searching for communities in networks, domain experts may have some prior expectations about the size of communities. Yet, community detection methods normally do not optimize communities under cluster size constraints. Multi-resolution techniques allow users to indirectly control the average community size through changing a resolution parameter, but this practice does not control the size of individual communities. We here study the problem of size-constrained community detection, where the size of all communities is limited to a user-specified range of values, in the context of modularity optimization. We propose a heuristic for modularity optimization under community size constraints. To demonstrate the reliability of our proposed heuristic, we also formulate an exact integer optimization model and use its results as a baseline. Our analysis based on synthetic benchmarks and real networks demonstrate the issues with the currently common practice of changing resolution parameters and reveal the advantages of the proposed methods as a principled way of obtaining size-constrained communities. The proposed method is publicly available in the Python Leiden algorithm package.