🤖 AI Summary
This study addresses the challenge of early prediction of national-level SARS-CoV-2 case surges. To model heterogeneous, multi-source temporal data, we propose the first cross-national, phase-aware multimodal fusion framework, integrating viral genomic sequences, public health policies, and human behavioral signals (e.g., mobility patterns and search indices). Our method combines temporal feature engineering, XGBoost classification, and SHAP-based interpretability analysis. Key contributions include uncovering dual heterogeneity in modality importance: behavioral data provide the most timely预警 (1–2 weeks ahead) for surge onset, while genomic mutation features exhibit specificity in identifying variant-driven surges; moreover, optimal modality combinations vary by national development level and epidemic phase. The model achieves a maximum AUC of 0.89—significantly outperforming unimodal baselines—and establishes a methodological paradigm and empirical foundation for context-adaptive pandemic forecasting.
📝 Abstract
The COVID-19 pandemic response relied heavily on statistical and machine learning models to predict key outcomes such as case prevalence and fatality rates. These predictions were instrumental in enabling timely public health interventions that helped break transmission cycles. While most existing models are grounded in traditional epidemiological data, the potential of alternative datasets, such as those derived from genomic information and human behavior, remains underexplored. In the current study, we investigated the usefulness of diverse modalities of feature sets in predicting case surges. Our results highlight the relative effectiveness of biological (e.g., mutations), public health (e.g., case counts, policy interventions) and human behavioral features (e.g., mobility and social media conversations) in predicting country-level case surges. Importantly, we uncover considerable heterogeneity in predictive performance across countries and feature modalities, suggesting that surge prediction models may need to be tailored to specific national contexts and pandemic phases. Overall, our work highlights the value of integrating alternative data sources into existing disease surveillance frameworks to enhance the prediction of pandemic dynamics.