🤖 AI Summary
This study proposes a novel approach that integrates node-level differential privacy with statistical network models to generate synthetic contact networks for infectious disease modeling while preserving individual privacy, particularly for sensitive attributes such as sexual behavior or drug use. The method adds calibrated noise to network summary statistics, fits either a stochastic block model (SBM) or an exponential random graph model (ERGM), and simulates agent-based SIS epidemic dynamics on the resulting synthetic networks. To the best of our knowledge, this is the first work to combine node-level differential privacy with ERGM/SBM for epidemiological data synthesis. Experiments on the ARTNet sexual contact network demonstrate that the introduced privacy-preserving noise is substantially smaller than sampling and model-induced errors, and that the synthetic networks accurately reproduce key transmission dynamics and intervention outcomes observed in the original network.
📝 Abstract
Epidemiologic studies of infectious diseases often rely on models of contact networks to capture the complex interactions that govern disease spread, and ongoing projects aim to vastly increase the scale at which such data can be collected. However, contact networks may include sensitive information, such as sexual relationships or drug use behavior. Protecting individual privacy while maintaining the scientific usefulness of the data is crucial. We propose a privacy-preserving pipeline for disease spread simulation studies based on a sensitive network that integrates differential privacy (DP) with statistical network models such as stochastic block models (SBMs) and exponential random graph models (ERGMs). Our pipeline comprises three steps: (1) compute network summary statistics using \emph{node-level} DP (which corresponds to protecting individuals' contributions); (2) fit a statistical model, like an ERGM, using these summaries, which allows generating synthetic networks reflecting the structure of the original network; and (3) simulate disease spread on the synthetic networks using an agent-based model. We evaluate the effectiveness of our approach using a simple Susceptible-Infected-Susceptible (SIS) disease model under multiple configurations. We compare both numerical results, such as simulated disease incidence and prevalence, as well as qualitative conclusions such as intervention effect size, on networks generated with and without differential privacy constraints. Our experiments are based on egocentric sexual network data from the ARTNet study (a survey about HIV-related behaviors). Our results show that the noise added for privacy is small relative to other sources of error (sampling and model misspecification). This suggests that, in principle, curators of such sensitive data can provide valuable epidemiologic insights while protecting privacy.