π€ AI Summary
To address the dual challenges of data silos and privacy constraints in localized infectious disease forecasting, this paper proposes a collaborative modeling framework integrating client-level differential privacy (CDP) with federated learning. Without sharing raw health data, the framework enables county-level jurisdictions to jointly train a sliding-window time-series prediction model: each client trains a local MLP on recent case counts, clips gradients, and injects CDP noise; the server further adds calibrated DP noise during aggregation to strengthen global privacy guarantees. Evaluated on two-stage county-level COVID-19 data, the method achieves RΒ² = 0.94 (MAPE = 26%) and RΒ² = 0.88 (MAPE = 21%) under moderate privacy budgets (Ξ΅ = 4), approaching the performance of non-private baselines. The key contribution is the first deep integration of CDP into the federated epidemiological forecasting pipeline, enabling high-accuracy, cross-jurisdictional modeling under strong formal privacy guarantees.
π Abstract
In times of epidemics, swift reaction is necessary to mitigate epidemic spreading. For this reaction, localized approaches have several advantages, limiting necessary resources and reducing the impact of interventions on a larger scale. However, training a separate machine learning (ML) model on a local scale is often not feasible due to limited available data. Centralizing the data is also challenging because of its high sensitivity and privacy constraints. In this study, we consider a localized strategy based on the German counties and communities managed by the related local health authorities (LHA). For the preservation of privacy to not oppose the availability of detailed situational data, we propose a privacy-preserving forecasting method that can assist public health experts and decision makers. ML methods with federated learning (FL) train a shared model without centralizing raw data. Considering the counties, communities or LHAs as clients and finding a balance between utility and privacy, we study a FL framework with client-level differential privacy (DP). We train a shared multilayer perceptron on sliding windows of recent case counts to forecast the number of cases, while clients exchange only norm-clipped updates and the server aggregated updates with DP noise. We evaluate the approach on COVID-19 data on county-level during two phases. As expected, very strict privacy yields unstable, unusable forecasts. At a moderately strong level, the DP model closely approaches the non-DP model: $R^2= 0.94$ (vs. 0.95) and mean absolute percentage error (MAPE) of 26 % in November 2020; $R^2= 0.88$ (vs. 0.93) and MAPE of 21 % in March 2022. Overall, client-level DP-FL can deliver useful county-level predictions with strong privacy guarantees, and viable privacy budgets depend on epidemic phase, allowing privacy-compliant collaboration among health authorities for local forecasting.