Large Scale Community-Aware Network Generation

📅 2025-11-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Real-world networks often lack ground-truth community labels, hindering rigorous evaluation of community detection algorithms. Method: We propose RECCS+ and RECCS++, two enhanced synthetic network generation methods for high-fidelity community structure modeling on unlabeled networks. RECCS+ introduces process-level parallelism and multi-threaded coordination; RECCS++ further refines the algorithmic logic while preserving degree sequence distributions and community topology. Both adopt a modular pipeline architecture integrating degree-sequence constraints and community-structure preservation techniques. Contribution/Results: RECCS++ achieves up to 139× speedup over prior methods on benchmark datasets and scales to ultra-large networks (>100 million nodes, ~2 billion edges)—enabling, for the first time, efficient synthesis of billion-node realistic community-structured networks. This establishes a scalable, high-fidelity benchmark generation paradigm for unsupervised community detection evaluation.

Technology Category

Application Category

📝 Abstract
Community detection, or network clustering, is used to identify latent community structure in networks. Due to the scarcity of labeled ground truth in real-world networks, evaluating these algorithms poses significant challenges. To address this, researchers use synthetic network generators that produce networks with ground-truth community labels. RECCS is one such algorithm that takes a network and its clustering as input and generates a synthetic network through a modular pipeline. Each generated ground truth cluster preserves key characteristics of the corresponding input cluster, including connectivity, minimum degree, and degree sequence distribution. The output consists of a synthetically generated network, and disjoint ground truth cluster labels for all nodes. In this paper, we present two enhanced versions: RECCS+ and RECCS++. RECCS+ maintains algorithmic fidelity to the original RECCS while introducing parallelization through an orchestrator that coordinates algorithmic components across multiple processes and employs multithreading. RECCS++ builds upon this foundation with additional algorithmic optimizations to achieve further speedup. Our experimental results demonstrate that RECCS+ and RECCS++ achieve speedups of up to 49x and 139x respectively on our benchmark datasets, with RECCS++'s additional performance gains involving a modest accuracy tradeoff. With this newfound performance, RECCS++ can now scale to networks with over 100 million nodes and nearly 2 billion edges.
Problem

Research questions and friction points this paper is trying to address.

Generating synthetic networks with ground-truth community labels for evaluation
Enhancing network generation algorithms through parallelization and optimization techniques
Scaling community-aware network generation to handle massive networks efficiently
Innovation

Methods, ideas, or system contributions that make the work stand out.

RECCS+ adds parallelization through orchestrator and multithreading
RECCS++ introduces algorithmic optimizations for additional speedup
Enhanced versions scale to networks with 100 million nodes
🔎 Similar Papers
No similar papers found.
V
Vikram Ramavarapu
Computer Science, University of Illinois at Urbana-Champaign, Urbana, 61801, IL, USA
J
João Alfredo Cardoso Lamy
Computer Science, Insper Institute, São Paulo, SP, Brazil
M
Mohammad Dindoost
Data Science, New Jersey Institute of Technology, Newark, 07102, NJ, USA
David A. Bader
David A. Bader
Distinguished Professor, New Jersey Institute of Technology
data sciencehigh performance computingcybersecuritymassive-scale analyticscomputational genomics